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(54) Title: GENERATION AND SELECTION OF RECOMBINANT VARIED BINDING PROTEINS 
(57) Abstract 

In order to obtain a novel binding protein against a chosen target, DNA molecules, each encoding a protein comprising 
one of a family of similar potential binding domains and a structural signal calUng for the display of the protein on the outer sur- 
face of a chosen bacterial cell, bacterial spore or phage (genetic package) are introduced into a genetic package. The protein is ex- 
pressed and the potential binding domain is displayed on the outer surface of the package. The cells or viruses bearing the bind- 
ing domains which recognize the target molecule are isolated and amplified. The successful binding domains are then character- 
ized. One or more of these successful binding domains is used as a model for the design of a new family of potential binding 
domains, and the process is repeated until a novel binding domain having a desired affinity for the target molecule is obtained. In 
one embodiment, the first family of potential binding domains is related to bovine pancreatic trypsin inhibitor, the genetic pack- 
age is M13 phage, and the protein includes the outer surface transport signal of the M13 gene III protein. 
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GENERATION AND SELECTION OF RECOMBINANT 
VARIEGATED BINDING PROTEINS 



5 Field of the Invention , 

This invention relates to development of novel 
binding proteins by an iterative process of 
mutagenesis, expression, chromatographic selection, and 
amplification • 

Information Disclosure Statement 

The amino acid sequence of a protein determines 
its three-dimensional (3D) structure, which in turn 
15 determines protein func-hionina r EPST63 . ANFI73 ) ♦ The 
system of classification of protein structure of Schulz 
and Schirmer (SCHU7£, ch 5) is adopted herein. 

The 3D structure of a protein is essentially. 

20 unaffected by the identity* of the amino acids at some 
loci; at other loci only one or. a few types of amino 
acid is allowed r SHORSS . EISE8S, REID88 ) . Generally, 
loci where wide variety is allowed have the amino acid 
side group directed toward the solvent. While limited 

25 variety is. allowed where the side group is- directed 
toward other parts of the protein. • (See also SCHU79, 
P169-171 and CREI84 , p239-245, 314-315) . 

The secondary structure (helices, sheets, turns, 
3 0 . loops) of a. protein is determined mostly by local 
-sequence. Certain amino acids tend to be correlated 
with certain secondary structures and the commonly used 
Chou-Fasman f CH0U7 4 . CH0U7 8a. CH0U7 8b ) rules depend on 
these ■ correlations. However, every amino acid type has 
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been observed in helices and in both parallel and 
antiparallel sheets. Pentapeptides of identical 
sequence are found in different proteins; in some cases 
the conf ormatiojis of the pentapeptides are very 
5 ' different CKABSBA . ARG087 ) • 

Turns and loops tolerate insertions and deletions 
more readily than do other secondary structures 
(HlCHSl, SUTC87a) ; related proteins differ most 

10 in loops and turns. 

Changing three residues in subtilisin from 
Bacillus amvloliouef aciens to be the same as the 
corresponding residues in s\ibtilisin from • B > 

15 lichenif ormis produced a protease that had nearly the 
same activity as the siabtilisin from the latter 
organism; 82 differences remained in the sequences. 
The three residues changed were chosen because they 
were the only differences within 7 Angstroms (A) of the 

20 active site f WEIjL87a ) . 

Schnlz and Schirmer summarize many observations on 
the binding of proteins to other molecules (SCHU79, 
pSS'-lOS) . For example, haemoglobin alpha chains bind 

25 very tightly to haemlbglofain beta chains (delta G more 
negative than -11.0 Kcal/mole) ; antibodies bind tightly 
to antigens (2C^s range from 10"^. to 10"^^ M, is the 
dissociation constant equal to [A] CB]/[A:B] ) ; basic 
bovine pancreatic trypsin inhibitor (BPTI) binds 

3 0 tightly to trypsin 6.0 x 10""^^ M (TSCH82) , delta 

G = -18.0 Kcal/mole); and avidin binds to biotin (K(j = 
1.3 X 10""^^ M f CREX84 , p362) ) . In each case the 
binding results from complementarity of the surfaces 
that come into contact: bumps fit into holes, unlike 

3 5 charges come together , dipoles align , and hydrophobic 



atoms contact other hydrophobic atoms. Althougli bulk 
water is excluded, individual water molecules are 
frequently found filling space in intermolecular 
interfaces; these waters usually form hydrogen bonds, to 
one or more atoms of the protein or to other . bound 
water. 

The factors affecting protein binding are known/ 
( CHOT75. CHOT76 / SCmj79 / p98-107, and CREI84 , Ch8 ) , but 
designing new compleinentary surfaces has proved 
difficult. Although some rules have been developed for 
sTobstituting side groups (SUTCSTb) , the side groups of 
proteins are floppy and it is difficult to predict what 
conformation a new side group will take* Further, the 
forces that bind proteins to other molecules are all 
relatively weak and it is difficult to predict the 
effects of , these forces* Hence, it . is ^difficult to 
design superior binding proteins based on theory alone 
(QUI087) . . 

Enzyme-substrate affinity, however, has 
fortuitously been increased by protein engineering 
(WXLK84) ► A point mutant of tyrosyl tRNA synthetase of 
Bacillus ■ stearothermophilus exhibits a 100-fold 
increase in affinity for ATP/ Sijbstitution of one 
amino acid for another at a surface locus may 
profoundly alter binding properties of the protein 
other than substrate binding, without affecting the 
tertiary structure of the protein. For example, in 
sickle-cell haemoglobin the change^ of the surface 
residue, E6 to V in the beta chains causes 
deoxyhaemoglobin-S' to form fibers through self binding 
f DICK83 . pl25-14S) ; the tertiary ' and quaternary, 
structure of the haemoglobin are not changed ( PADL85 , • 
WISH75. W1SH76 ^ . 
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Clianging a single amino acid in BPTX greatly ' 
reduces its binding to trypsin^ but some of the new 
molecules retain tlie parental characteristics of 
5 binding to and inhibiting chymotrypsin/ while others 
exhibit new binding to elastase ( TA2>rK77 ; . TSCH87 ) . 
Changes of single amino acids on the surface of the 
lambda Cro repressor greatly reduce its affinity for 
the natural operator Or3;^ but greatly increase the 
10 binding of the- mutant protein to a mutant operator 
(S1SM5) * Thus changing the surface of a binding 
protein may alter its specificity without abolishing 
binding activity, 

15 The recently developed techniques of "reverse 

genetics" have been used to produce single specific 
mutations at precise base pair loci ( OIiXPS 6 > 0LXP87 . 
and AXJSU87^ . Mutations are generally detected by 
sequencing and in some cases by loss of wild*-type 

30 function. These procedures allow researchers to 
analyze the function of each residue in a protein 
( MILLS or of each base pair in a regulatory DNA 
sequence . (CHENSS) • In these analyses, the norm has 
been to strive for the classical goal of obtaining 

25 mutants carrying a single alteration (AUSU87) • 

Reverse genetics is often applied to coding 
regions to determine which residues are most important 
to protein structure and function? isolation of a 
30 single mutant at each residue of the protein gives an 
initial estimate of which residues play crucial roles* 

Prior to the method of the present invention, two 
general approaches have been developed to create novel 
35 mutant proteins through reverse genetics. In one 
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20 



25 



30 



approach, dubbed "protein surgery" (DILL87) , a specific 
substitution is introduced at a single protein residue 
to determine the effects on structure and function of 
specific substitutions r cRAlS5^ ^pan.ciPv) rB^.g^^T ; 
However, many desirable protein alterations require 
multiple amino acid substitutions and thus are not 
accessible through single base changes or even through 
all possible amino acid substitutions at any one 



residue . 



The other. approach has been randomly to generate a 
variety of mutants at many loci within a cloned gene 
using mutagenic chemicals or radiation. The specific 
location and nature of the change are determined by dna 
15 sequencing. (PAKUa6) This approach is limited by the 
number of colonies that can be examined. Also, it does 
not take advantage of any knowledge of the protein 
structure and its relationship to binding activity. 

Progress toward rules governing substitutions of 
amino acids. CULME83) has been greatly hampered by the 
extensive efforts involved in using either method and 
the practical limitations on the number of colonies 
that can be inspected (ROBES 6) . 



35 



The term "saturation mutagenesis" with reference 
to synthetic DNA is generally taken' to mean generation 
of a population in which: a) every possible single-base 
change within a fragment of a gene of DNA regulatory 
region is represented, and b) most mutant genes contain 
only one mutation. Thus a set of all possible single 
mutations for a 6 base pair length of DNA comprises a 
population Of is" mutants. Oliphant et al^ (OL1P86) and 
Olaphant and Struhl (0LIP8 7) have, demonstrated ligation 
and cloning of highly degenerate oligonucleotides and 
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have applied saiiuration mutagenesis to tlie study of 
proiuoter sequence and function- They suggest that 
similar methods could be used to study genetic 
expression of proteins, but they do not say how to: a) 
5 choose protein residues to vary, or b) select or screen 
mutants with desirable properties. 

Keidhaar-Olson and Sauer (REID8S) have used 
synthetic degenerate oligo-nts to vary simultaneously 

10 two or three residues through all twenty amino acids in 
the dimer interface of cl repressor from bacteriophage 
lambda. They give no discussion of the limits on how 
many residues could be varied at once nor' do they ^ 
mention the problem of ' unequal abundance of ' DNA 

15 encoding different amino acids. They looked for 
proteins that either had wild-type dimerization or that 
did not dimerize. They did not seek proteins having 
novel binding properties and did not report any. 

2 0 Several researchers have designed and synthesizeii 

proteins de novo . These designed proteins are small 
and most have been synthesized in vitro as polypeptides- 
rather than genetically. Gutte and colleagues have 
made a polypeptide that binds DDT in 55% ethanol 

25 (MQSE83). Recently Moser et al. (MOSEB?) reported 
genetic expression .in coli both of the designed 24 
residue DDT-binding protein and of fusions of the DDT- 
binding sequence to LacZ. They state that design of 
biologically active proteins is currently impossible. 

30 

Erickson et al . (ERICBS) have designed and 
synthesized a series of proteins that they have named 
betabellins, that are meant to have beta sheets. They 
suggest use of polypeptide synthesis with mixed 
35 reagents to produce several hundred analogous 



betabellins, and use of a column to recover analogues 
with high affinity for a chosen target compound bound 
to the column. They envision successive rounds of 
mixed synthesis of variant proteins and purification by 
specific binding. They do not discuss how residues 
should be chosen for variation* Because proteins 
cannot be amplified, the researchers must sequence the 
recovered protein to learn which substitutions improve 
binding* The researchers must limit the level of 
diversity so that each variety of protein will be 
present in sufficient quantity for the isolated 
fraction to be sequenced, ' • 

Methods have been developed to separate cells • 
through their affinity to various substances. Methods 
applied to animal cells reveal common problems; a) non- 
specific interactions between cells and affinity 
supports, and b) irreversible binding of cells to 
affinity matrices (BONNSS) * 

Ferenci and collaborators have published a series 
of papers on the chromatographic isolation of mutants 
of th6 maltose-transport protein LamB of coli 
(WAND79, FERESOa, FERESOb, FERESOc, FERE82a, FERE82b; 
FERE83, CLUN84, FERE86a, FERE86b, FEREaSc, FERESVa/ 
FERE87b, HEIN87, and HEIN88) • The papers report that 
spontaneous and induced mutants at the lamB genetic 
locus can be isolated by chromatography over a column 
supporting immobilized maltose, ^ maltodextrins , or 
starch. The reports speculate that other applications 
are possible, but specifically mention only the 
elucidation of the residues responsible for the 
selectivity of the maltodextrin pore or similar pore 
proteins . The mutant proteins were non-chimeric , and 
no attempt was made to obtain binding to a new target. 
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Botli FERE8 6a and CLUN84 point; up the 
dif f icul-ties of worJcing w^ith live bacteria that: can 
metabolize chemicals and change their physiological 
5 behavior during the chromatographic experiment- 

A fragment of a heterologous , gene can be 
introduced into bacteriophage Fl gene IXI (SMIT85) • If 
the inserted gene preserves the original reading frame 

10 expression of the altered gene III causes an inserted 
domain to appear in the gene III protein • The 
resulting strain of f 1 virions are adsorbed by an 
antibody against the protein encoded by the. 
heterologous DNA. The phage were eluted at pH 2.2 and 

15 retained some infectivity. However, the single copy of 
fl gene III was used for insertion of the heterologous 
gene so that all copies of gene III protein were 
affected; infectivity of the resultant phage was 
reduced 25-fold. 

20 

Smith presented his method as a way to isolate 
cloned genes using antibodies to the gene products. He 
made no mention of mutagenizing the inserted genetic 
material or of inducing novel binding properties in the 
25 inserted protein domain • 

A fragment of the repeat .region of the 
circtamsporozoite protein from Plasmodium falciparum has 
been expressed on the surface of M13 as an insert in 
30 the gene III protein (CRUZ8S) . The recombinant phage 
were both antigenic and immunogenic in rabbits. The 
authors do not suggest mutagenesis of the inserted 
material. 



Gene fragments coding for liepatitis B virus 
antigens have been fused to fragments of lamB; and if 
the fusion is in a region coding for exposed domains of 
. LamB, the HBV antigens appear on the cell surface and 
are immunogenic (CHARS 7) , charbit et al > (CHARS 7) 
suggest use of these engineered strains for development 
of a live bacterial vaccine; they did not suggest 
mutagenesis of the fused heterologous gene fragments;, 
nor development of binding capabilities. 

Ladner, US Patent No* '4^704,692, "Computer Based 
System and Method, for Determining and Displaying 
Possible Chemical Structures for Converting Doxible- or 
Multiple-Chain Polypeptides to Single-Chain- 
Polypeptides" describes a design method for converting 
proteins composed of two or more chains into proteins 
of fewer polypeptide chains, but with essentially 'the 
same 3D structure. There is no mention of variegated 
DNA and no genetic selection. Ladner and Bird, 

WO88/01649 (Publ. March 10, 1988) disclose the specific 
application of computerized design of linker peptides 
to the preparation of single chain antibodies. 

Ladner, Glick and Bird, WO88/06630 (piibl. 1 Sept. 
1988) (LGB) speculate that diverse single chain 
antibody domains may be screened for binding to a 
particular antigen by varying the DNA encoding the 
combining determining regions of a single chain 
antibody, subcloning the SCAD gene into the gpV gene of 
phage lambda so that a SCAD/gpV chimera is displayed on 
the outer surface of the phage, and selecting phage 
which bind to the antigen- through affinity 
chromatography. The only antigen mentioned is bovine 
growth hormone. No other binding molecules, targets, 
carrier organisms, or outer surface proteins are 
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discussed. Nor is there any mention of the method or 
degree of mutagenesis. 

Ladner and Bird, W088/06601 (publ. 7 September 
5 198S) suggest that single chain "psuedodimeric" 
repressors (DNA-binding proteins) may be prepared by 
mutating a putative linker peptide followed by in vivo 
selection that mutation and selection may be used to 
create a dictionary of recognition elements for use in 
10 the design of asymmetric repressors. The repressors 
are not displayed on the outer surface of an organism-. 

No admission is made that any cited reference is - 
prior art or pertinent prior art, and the dates given 
15 are those appearing on the reference and may not be 
identical to the actual publication date. 

SUMMMtY OF THE INVENTION 

20 This invention relates to the construction, 

expression, and selection of mutated genes that specify 
novel proteins with desirable binding properties, as 
well as these proteins themselves, - The substances 
bound by these proteins, hereinafter referred to as 

25 "targets", may be, but need not be, proteins* Targets 
may include other biological or ' synthetic 
macromolecules as well^ as organic and inorganic 
molecules. 

^0 . Tile novel binding proteins may be obtained: 1) by 

mutating a gene encoding a known binding protein within 
the subsequence encoding a known binding domain, or 2) 
by taking such a subsequence of the gene for a first 
protein and combining it with all or part of a gene for 

35 a second protein (which may or may not be itself a 
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known binding protein) , 3) by mutating a gene encoding 
a protein which ^ while not possessing a Jcnown binding 
activity, possesses a secondary or higher structure 
that lends itself to binding activity (clefts, grooves, 
etc, ) , or 4 ) by mutating a gene encoding a known 
binding protein but not in the subsequence known to 
cause the binding- The protein from which the novel 
binding protein is derived need not have any specific 
affinity for the target material. 

In one embodiment, the invention relates to: 

a) preparing a variegated population of replicable 
genetic packages, each package including a nucleic 
acid construct coding on expression for an outer- 
surface-displayed potential binding protein 
comprising (i) a structural signal directing the 
display of the protein on the outer surface of the 
package and (ii) a potential binding domain for 
binding said target, where a plurality of 
different potential binding domains are displayed 
by the individual packages, 

b) causing the expression of said protein and the; 
display of said protein on the outer surface of 
such packages, 

c) contacting the packages with target material so 
that" the potential binding domains of the proteins 
and the target material may interact, and 
separating packages bearing a potential binding 
domain that succeeds in binding the target 
material from packages that do not so bind. 
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d) recovering and replicating at least one package 
bearing a successful binding domain, 

(e) determining the amino acid sequence of the 
successful binding domain of a genetic package 
which bound to the target material^ 

(f ) preparing a new variegated population of 
replicable genetic packages according to step (a) , 
the" parental potential binding domain for the 
potential binding domains of said new packages 
being a successful binding domain whose sequence 
was determined in step (e) , and repeating steps 

(b)-(e) with said new population, and, when a 
package bearing a binding domain of desired 
binding characteristics is obtained, 

(g) abstracting the gene encoding the desired 
binding domain from the genetic package and 
placing it into a suitable expression system. 

(The binding doma:in may then be expressed as a 
unitary protein, or as a domain of a larger 
protein) • 

The invention further relates to a method of 
preparing a mixed population of replicable genetic 
packages in which each package includes a gene 
expressing a potential binding protein in such a manner 
that the protein is presented on the outer surface of 
the package. This method comprises: 

i) preparing a- variegated population of DNA 
inserts "of each of which comprises, a first 
sequence which codes on expression for a potential 
binding domain and, a second sequence encoding 



. V ■ , 

signal directing that the encoded protein be 
displayed on the outer surface of a chosen 
replicable .genetic package^ and 

ii) incorporating, the resulting population of DNA 
constructs into the chosen, replicable genetic 
packages to produce a. population of replicable 
genetic packages. 

In a preferred ejnbodiment , the potential-binding*- 
. protein-encoding inserts are incorporated into a gene 
encoding an outer-surface protein of the replicable 
genetic package. 

The invention encompasses the design and synthesis 
of variegated DNA encoding a family of potential 
binding proteins characterized by constant and variable 
regions, said proteins being designed with a view 
toward obtaining a. protein . that binds a predetermined 
target. 

For the purposes of this invention, the term 
"potential binding protein" refers to a protein encoded 
by one species of DNA molecule in a population of 
variegated DNA wherein the region of variation appears 
in one or more subsequences encoding one or more 
segments of the polypeptide having the potential of 
serving as a binding domain for the target substance • 

Prom time to time, it may be helpful to speak of 
the "parent sequence" of the variegated DNA. When the 
novel binding domain sought is an analogue of a known 
binding domain, the parent sequence is the sequence 
that encodes the known binding domain. The variegated 
DNA will be identical with ,this parent sequence at most 
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loci, but will diverge from it at ciiosen loci. When a 
potential binding domain is designed from first 
principles, the parent sequence is a sequence which 
encodes the amino acid sequence that has been predicted 
to form the desired binding domain, and the variegated 
DNA is a population of "daughter DNAs" that are related 
to that parent by a high degree of sequence similarity. 

The fundamental principle of the invention is one 
ot forced evolution ^ The efficiency of the forced 
evolution is greatly enhanced by careful choice of 
which residues are to be varied* The 3D structure of 
the potential binding domain is a key determinant in 
this choice. First a set of residues that can 
simultaneously contact one molecule of the target is 
identified. Then all or some of the codons encoding 
theise residues are varied simultaneously to produce a 
variegated population of DNA- The variegated 
population of DNA is used to transform cells so that a 
variegated population of genetic packages is produced. 

. The mixed population of genetic packages 
containing genes encoding possible binding proteins is 
enriched for packages containing genes that express 
proteins that in fact bind to the target ("successful 
binding domains") - After one or more rounds of such 
enrichment, one or more of the chosen genes are 
examined and sequenced. If desired, new loci ^ of 
variation are chosen. The selected daughter genes of 
one generation then become the parent sequences for the 
next generation of variegated DNA, beginning the . next 
"variegation cycle." Such cycles are continued until a 
protein with the desired target affinity is obtained. 
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. Tlie appended claims are hereby incorporated by- 
reference into this specification as an enumeration of 
the preferred embodiments. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a schematic showing the relationships 
between various types of Binding Domains (BD) • 

Figure 2 is a flow chart showing the major steps used 

to create a hovel protein with affinity for a pre- 
determined target. 

Figure 3 is a schematic of a PBD contacting a molecule 
of target material • 

Figure 4 is a schematic of the construction of pLG'3 
from M13mpl8 and pBR322. 

Figure 5 is a schematic of the construction of pLG7 
from pLG3 and synthetic DNA* 

DETAIIiED DESCRIPTION OF THE INVENTION 

Sec. 0.1: Overview: 

The present invention separates mutated genes that 
specify novel proteins with desirable binding 
properties from closely related genes that specify 
proteins with no or undesirable binding properties, by: 
1) arranging that the product of- each mutated gene be 
displayed on the outer surface of a replicable genetic 
package that contains the gene, and 2) using* affinity 
separation incorporating a desirable target material to 
enrich the population of packages for those packages 
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containing genes specifying proteins with improved 
binding to that target material. 

Let Kq (x,y) be a dissociation constant, 

5 ' 

rx] [y] 

«D(^ry) - " " 

Cx:y] 

10 For the purposes of the appended claims, a protein 

P is a binding protein if " 

for one molecular, ionic or atomic species 
the dissociation constant (P/A) 
< 10""^ moles/liter; and 

(2) for a different molecular, ionic or 
atomic species B, (P/B) > 10""^ 

moles/liter. 

20 

As a result^ of these two conditions, the protein P 
exhibits specificity for A over B, and a irtinimum degree 
of affinity (or avidity) for A. 

25 When a domain of a protein is . primarily 

responsible for the protein' s ability to specifically 
bind a chosen target, it is referred to herein as a 
"binding domain" (BD) , We engineer the appearance . of a 
stable protein domain, denoted as an "initial potential 

3 0 binding ■ domain" (XPBD) , on the surface of' a genetic 
package . The present invention is concerned with the 
expression of numerous, diverse, variant "potential 
binding domains" (PBD) , all related to. a "parental 
potential binding domain" (PPBD) such as the binding 

35 domain of a known binding protein, and with selection 
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and ■ aiaplification of the genes encoding the most 
successful mutant PBDs/' An IPBD is chosen as PPBD to 
the first round of variegation. s elect ipn-through- 
binding isolates one -or more "successful binding 
5 ' domains" (SBD) , An BED from, one round of variegation ., 
and selection-through-binding is chosen to be the PPBD 
for the next round. The invention is . not, however, 
limited to proteins with a single BD since the method 
may be applied to any or all of the BDs of the protein, 
10 sequentially or simultaneously. The relationships of 
the various BDs^are illustrated in Figure 1- 

The term "variegated DNA" refers to a population 
of molecules that have the same base sequence through, > 

15 most of their length, but that vary at "a limited number 
of defined loci, preferably 5-10 codons. A molecule of 
variegated DNA can be introduced into a plasmid so that 
it constitutes .part of a gene (OLIP86, 0LIP87, AUSU87, 
•RE1D88)* When plasmids containing variegated DNA are 

20 used to transform bacteria, each cell makes a version 
of the -original protein/ Each colony of bacteria may 
produce a different veirsion from any other colony. If 
the variegations of the DNA are concentrated at loci ^ 
known to be on the surface of the protein or in a loop, 

25 a population bf proteins will be generated, many 
members of which will fold into roughly the same 3D 
structure as the parent protein. The specific binding 
properties of each member, however, may be different 
from each other member. It remains to sort out the 

3 0 colonies containing genes for proteins with desirable 
binding properties from those that do not exhibit the 
desired affinities. 

A "single-chain antibody" is a single chain 
35 polypeptide comprising" at least 200 amino acids, said 
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amino acids f ormixig two antigen-binding regions 
connected by a peptide linker that allows the two 
r^egions to fold ' together to bind the antigen. Either 
the two antigen-binding regions must be variable 
5 domains of known antibodies . or they must (1) each fold 
into a beta barrel of nine strands that are spatially 
related in the same way as are the nine strands of 
3cnown antibody variable light or heavy domains, and (2) 
fit together in the same way as do the variable domains 
10 of said known antibody* Generally speaking^ this will 
require that, with the exception of. the amino acids 
corresponding" to the hypervariable region, there is at 
least 88% homology with the amino acids of the variable . 
domain of a known antibody. 

15 ' . 

The term "affinity separation means" includes, but 
-is not limited to: a) affinity column chromatography, 
b) batch elution from an affinity matrix material, c) 
batch elution from an affinity material attached to a 

20 plate, d) fluorescence activated cell sorting, and e) 
electrophoresis in the presence of target material. 
"Affinity material" is used to mean a material with 
affinity for the material to be purified, called the 
"analyte". In most cases, the association of the 

25 affinity material and the analyte is reversible so that 
the analyte can be freed from the affinity material 
once the impurities are washed away. 

Affinity column chromatography, batch elution from 
3 0 an affinity matrix material held in some container, and 
batch elution from a plate are very similar and 
hereinafter will be treated under "affinity 
chromatography . " 
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Fluorescent-activated cell sorting involves use of 
an affinity material that is fluorescent per se or is 
labeled with a fluorescent molecule. current 

coramercially available cell sorters require 8 00 to 100 0 
5 molecules of fluorescent dye, such as Texas red, bound 

to each cell. FACS can sort 10*^ cells or viruses/sec. 

■■ . ' - ■ *■ ■ 

Electrophoretic affinity separation involves 
electrophoresis of virxises or cells in the presence of 
10 target material, wherein the binding of said target 
material changes the net charge of the virus particles 
or cells. It has been used to separate bacteriophages 
on the basis of chsirge. (SKRW87) • 

15 The present invention makes use of affinity 

separation of bacterial cells, or bacterial viruses (or 
other genetic packages) to enrich a population for 
those cells or viruses, carrying genes that code for 
proteins with desirable binding properties. 

20 

1^1 the present invention, the words "select" .and 
"selection'^ are used ejiclusively in the genetic sense; 
i-. e. a biological process ' whereby a phenotypic 
characteristic is used to enrich a population for those 
25 organisms displaying the desired phenotype. 

The process of the present invention comprises 
three ma j or parts : 

3 0 I. design and production of a replicable 

genetic packag-e (GP) that displays an IPBD on 
the surf ace of the GP, denoted GP( IPBD) , 

II. design and implementation of an affinity 
35 separation process that separates GP(IPBD)s 
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i::iiat bind -fco a known affinity molecule from 
wild-type GPs or GP(:IPBD'')s^ neither of which 
binds the known affinity molecule and 

5 III. design and . implementation of a genetic 

variegation method; denoted structure- 
directed mutagenesis wherein a population of 
10^ or more different GP(PBD)s, denoted 
GP(vgPBD), is produced - 
-10 ' 

One affinity separation is called a "separation cycle"; 
one pass of variegation followed by as many separation 
cycles as are needed to isolate an SBD, is called a. 
"variegation cycle". The amino acid sequence of one 
15 SBD from one round becomes the PPBD to the next- 
variegation cycle. We perform variegation cycles 
iteratively until the desired affinity and specificity 
of binding between an SBD and chosen target are 
achieved. 

20 

Part I is a strain construction in which we deal 
with a single IPBD sequence. Variability may be 
introduced into DNA subsequences adjacent to the ir>bd 
subsequence and within the osp-ipbd gene so that the 

25 XPBD will appear on the GP surface. A molecule^ such 
as an antibody, having high affinity for correctly 
folded IPBD is used to: a) detect IPBD on the GP 
surface, b) screen colonies for display of IPBD on the 
GP surface, or c) select GPs that display IPBD from a 

'3 0 population, some members of which might display IPBD on 
the GP surface. In one preferred embodiment. Part I of 
the process involves: . 

1) choosing a GP such as a bacterial cell (Sec. 
35 1.1.1), bacterial spore (1.2.1), or phage (1.3.1), 
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having a suitable outer surface protein (Sees, 
1.1»3, 1.2.3, and_l,3.3), 

2) choosing a stable IPBD (Sec- 2) , 

■ 5 

3) , designing an amino acid sequence that: a) 
includes the IPBD as a siibseguence and b) will 
cause the IPBD to appear on the GP surface (Sees. 
1.1.2, 1.2.2, 1.3.2, and 4 ) , 

10 

4) engineering a gene, denoted osTD-ipbd ^ that: a) 
codes for the designed animo acid sequence, b) 
provides the necessa-ry genetic regulation, and c) 
introduces convenient sites for genetic 

15 manipulation (Sees. 4.1, 4.2, 4.3, 5.1, and 5.2), 

5) cloning the osD-ipbd gene into the GP (Sec. 
6.1), and 

6) ' harvesting the transformed GPs (Sec. 7) and 
testing them for presence of IPBD on the GP 
surface (Sec. 8); this test is performed with an 
■affinity molecule having high affinity for- IPBD, 
denoted AfM(IPBD) . 

In another preferred embodiment. Part I of the process 
involves: 

1) and 2) as above 

3) designing a DNA sequence that: a) encodes the 
IPBD ,as a subsequence and b) contains suitable 
restriction sites so that random DNA may be 
operably. linked to the ipbd gene fragment; and c) 
provides the necessary genetic regulations; " this 
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DNA sequence is called a "display probe", (Sees. 
1.1.4/ 1.2.4^ 1.3.4 and 4), 

4) construc-fcing that display probe 

5) cloning, the display probe into and amplifying 
it in a suitable host into the ocv, 

6) cloning random or pseudorandom DNA into one of 
the restriction sites provided in the display 
probe;. (Sec. 6.2), whereby the random or 
pseudorandom DNA functions as a potential osp ^ and 

7) harvesting GPs (Sec. 7) screening colonies of 
. the transformed GPs for presence of IPBD on the GP 

•surface; this screening is performed with an 
affinity molecule having high affinity for IPBD, 
denoted AfM(IPBD) , (Sec. 8) ; or, alternatively; 

8) selecting GPs that display IPBD by use of an 
affinity separation using AfMCIPBD) , (Sec. 8), 

Once a GP(IPBD) is produced, it can be used many 
times as the starting point for developing different 
novel proteins that bind to a variety of different 
targets . The Icnowledge of how we engineer the 
appearance of one IPBD on the surface of a GP can be 
used to design and produce other GP(IPBD)& that display 
different IPBDs, 

Although Part I deals with only a single IPBD, 
many preparations are made for Part III where we 
introduce numerous mutations into the potential binding 
domain. References to PBD or nbd in Part I are to 
indicate a preparatory intent. 
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In Part II we optimize separation of GP(TPBD) from 
wild-type . GP, denoted wtGP,. based on the affinity of 
IPBD for AfM(IPBD) and establish the sensitivity of the 
5 affinity separation process. In a preferred 

embodiment. Part II of the process of the present 
invention involves : 

1) preparing affinity colirrrins bearing AfM(IPBD) at 
various densities of AfM(XPBD)/(voliame of matrix), 
(Sec, 10,1)., 

2) preparing GP ( IPBD) s with various amounts of 
IPBD per GPy 

15 

3) picking a gradient regime for eluting the 
columns (Sec. 10.1), 

4) determining which combination of: a) IPBD/GP, 
20 b) density of AfM ( IPBD) / (volume of support), c) 

initial ionic strength, d) elution rate, and e) 
(amount of GP)/ (volume of support) loaded, gives 
the best separation of GP(IPBD) from wtGP (Sec, 
10,1) , ' . 

25 

5) determining the smallest amount of GP(IPBD) 
that can be isolated from a much larger amount of 
wtGP using the optimal, condition, (Sec, 10.2), and. 

3 0 6) determining the efficiency of the affinity 

separation procedure (Sec. 10,3). 

Part II optimizes separation of a single type of 
GP.(IPBD) from a large excess of a single different GP. 
35 The optimum conditions will be used in Part III to 
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separate GP(PBD)s that bind the target from GP(PBD)s 
that do not bind the target. The optimization will be 
at one or more specific temperatures and at one or more 
specific pHs, In Part XXI ^ the user must specify the 
5 conditions under which the selected SBD should bind the 
target. If the conditions of intended use differ 
markedly from the conditions for which affinity 
separation was optimized the user must return to Part 
IX and optimize the affinity separation for conditions 
10 similar to the conditions of intended use of the 
selected SBD. 

In Part III , we choose a target material and a . 
GP(IPBD) that was developed by the method of Part 1 and 

15 that is suitable to the target material. Using IPBD as 
the PPBD to the first cycle of variegation, we prepare 
a wide variety of oso-TPbd genes that encode a wide 
variety of PBDs. We use an affinity separation, 
developed by the method of Part II , to enrich the 

2G population of GP(vgPBD)s for GPs that display PBDs with 
binding properties relative to the target that are 
superior to the binding properties of the PPBD. An SBD 
selected from one variegation cycle becomes the PPBD to 
the next variegation cycle. In a preferred embodiment 

25 Part III of the process of the present invention 
involves : 

1) picking a. target molecule (Sec. 11) , 

30 2) picking a GP(IPBD) (Sec. 12), 

3) picking a set of several residues in the PPBD' 
to vary based on a) the 3D structure of the IPBD, 
b) sequences of homologous proteins, and c) 
35 computer or theoretical modeling that indicates 



which, residues can tolerate different amino acids 
without disrupting the underlying structure (Secy 
13.1) , 

4) picking a subset of the residues to be varied 
simultaneously based on the number of different 
variants and which variants are within the 
detection capabilities of the affinity separation; 
(Sec. 13 .2) ; 

5) implementing the variegation by: 

a) synthesizing the part of the oso-pbd gene 
that encodes the residues to be varied using a 
specific mixture of nucleotide substrates for 
some or all of the bases encoding residues 
slated for variation^ thereby creating a 
population of DNA molecules, denoted vgDNA 
(Sec, 13 . 3) , 

b) ligating this vgDlTA, by standard methods, 
into the operative cloning vector (OCV) T e. a. 
a plasmid or bacteriophage) (Sec\ 14.1), 

c) using the ligated DNA to transform cells, 
thereby producing a population of transformed 
cells (Sec. 14.2), 

d) culturing ( i.e. increasing in number) the 
population of transformed cells and harvesting 
the population of GP(PBD)s, said population 
being denoted as ,GP(v'gPBD), (Sec. 14.3), 



e) enriching the population for GPs that bind 
the target by using the affinity separation 
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process developed in Part II, with, the chosen 
target molecule as affinity molecule (Sec, 15)^ 

f) repeating steps III*5*d and Ill.S.e until a 
5 GP(SBD) having improved binding to the target 

is isolated (Sec. 15), and 

g) testing the isolated SBD onr SBDs for 
affinity and specificity for the chosen target 

10 (Sec, 15.8), 

6) repeating steps III. 3, 111.4, and III. 5 until 
the desired degree of binding is obtained. 

15 Part III is repeated for each new target material. 

Part I need be repeated only if no GP(IPBp) suitable to 
a chosen, target is available. Part II need be repeated 
for each newly-developed GP(IPBD) and for previously- 
developed ^GP(IPBD)s if the intended conditions of use 

20 of a; novel binding protein differ significantly from 
the conditions of previous optimizations. 

Sec . 0.2: Abbreviations ; 

25 The following abbreviations will be used 

throughout the present invention: 

Abbreviation Meaning 

30 GP Genetic Package, e^g-. a 

bacteriophage 

- X Any protein 

3 5 X The gene for protein X 
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IPBD 



PBD 



Initial- Potential Binding 
Domain, e.g. BPTI 

Potential Binding Domain, e.cy. 
a derivative of BPTI 



10 



SBD 



Successful Binding Domain, 
e.g. a derivative of BPTI 
selected for binding to a 
target 



PPBD 



15 



2 0 



25 



30 



OSP 



OSP-PBD 



OSTS 



GP (X) 



GP(X) 



Parental Potential Binding 
Domain, i.e. an IPBD or an SBD 
from a previous selection 

Outer Surface Protein, e.g. 
coat protein of a phage or 
LamB from E^. coli 

Fusion of an OSP and a PBD, 
order of fusion not specified 

Outer Surface Transport Signal 

A genetic package containing 
the X gene 

A genetic package that 
displays X on its outer 
surface 



An affinity matrix supporting 
"Q», e.g. {T4 lysozyme} is T4 
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lyso.2yiae attaclied to an 
affinity matrix 



AfM:(W) 



A xiolecule having affinity, for 
"W", e.cf. trypsin is an 
AfM(BPTI) 



10 



XINDUCE 



A chemical that can induce 
expression of a gene, e. cr. 
IPTG- for the lacUVS promoter 



OCV 



Operative Cloning Vector 



15 



Kqi ^ [T] [SBD]/[T:SBD] (T is a 
target) 



[N] [SBD]/[N:SBD] Is a 

non-target) 



20 



DoAMoM 



Density of AfM(W) on affinity 
matrix 



25 



AbunCx) 

OMP 
nt 



Abundance of DNA molecules 
encoding amino acid x 

Outelr membrane protein 

nucleotide 



30 



35 



Kd 



^err 



A bimolecular dissociation 
constant, [A][B]/[A:B] 

Error level in synthesizing 
vgDNA 
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Sec, 0.3: Standard secjuencing method: 

■ ■ «■ ," 

The- present invention, is not limited to a single 

5 method of determining the sequence of nucleotides (nts) 

in DNA subsequences. Sequencing reactions, agarose gel 

electrophoresis, and polyacryl amide gel electrophoresis 

(PAGE) are performed by standard procedures (AUSU87) . 

10 The present invention is not limited to a single 

method of detei^nining protein sequences, and , reference 
in the appended claims to determining the ; amino acid 
sequence of a domain is intended to , include any 
practical method or combination of methods, whether, 
15 direct or indirect. The preferred method, in most 
cases, is to determine the sequence of the DNA that 
encodes the protein and then to infer that amino acid 
sequence. In some cases, standard methods of protein- 
sequence determination may be needed to detect post- 
20 translational processing. 

-.i... — — — 

The major steps in the process of making and 
'25 isolating a novel binding protein with affinity for* a 
chosen target material are illustrated in Figure 2. 

Sec. 1: Specification of Genetic Package and Means for 
Displayincr a Heterologous Binding Domain On Its Outer . 
3 0 Surface: 

Sec . 1.0; General Requirements for Genetic Packages 

It is emphasized that the GP on which selection- 
3 5 through-binding will be practiced must be capable, 
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after tlae selection;, either of growth in some suitable 
environment or of in vitro amplification and recovery 
of the encapsulated genetic message. During at least 
part of the growth, the increase in number must be 
5 approximately exponential with respect to time. The 
component of a population that exhibits the desired 
binding properties may be quite small ^ for example, one 
in 10^ or less. Once this component of the population 
is separated from the non-binding components, it must 
10 be possible to amplify it* Culturing viable cells is 
the most powerful amplification of genetic material 
known and is preferred. Genetic messages can also be 
amplified in vitro, but this is not preferred. 

15 A GP may typically be a vegetative bacterial cell, 

a bacterial spore or a bacterial DNA virus. A strain 
of any living cell or virus is potentially useful if 
the strain can be: 

2 0 1) maintained in culture, 

2) affinity separated and retain its viability, 

3) * genetically altered with reasonable facility, 
25 and 

4) manipulated to display the potential binding 
protein domain where it can interact with the 
target material during affinity separation. 

30 

DNA encoding the IPBD sequence may be operably 
linked to DNA encoding at least the outer surface 
transport signal . of an outer surface protein (OSP) 
native to the GP so that the IPBD is displayed on the 
35 outer surface of the GP. It should be possible to 
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cause a genetic package to display the IPBD or PBD on 
its outer surface without adversely affecting the 
viability of the GP or the binding characteristics of 
the IPBD or PBD, if the fusion is near domain 
5 ' boundaries ^ BECKS 3 , CRAW87, TOTH8 6 ^ SMIT85, MAN086; and 
cf, R0SS81, HbLL83). 

Those characteristics of a protein that are 
10 recognized by a cell and that cause it to be 
transported out of the cytoplasm and displayed on the 
cell surface will be termed "outer-surface transport 
signals". 

15 The replicable genetic entity (phage , or plasmid) 

that carries the osp-pbd genes . (derived from the osp- 
i-pbd gene) through the selectidn-through-binding 
process^ see Sec. 14, is referred to hereinafter as the 
operative cloning vector (OCV) . When the OCV is a 

20 phage, it may also serve as the genetic package. The 
choice of a GP is dependent in part on the availability 
of a suitable OCV and suitable OSP. 

Preferably, the GP is readily stored, for example, 
25 by freezing. If the GP is a cell, it should have a 
short doubling time, such as 20-40 minutes. If jthe GP 
is a virus., it should be. prolific, e.g. , a burst size 
of at least 100/infected cell. GPs which are finicky 
or expensive to culture are disfavored. The GP should 
30 be easy to harvest, preferably by centrifugation. The 
GP is preferably stable for a temperature range of. -70 
to 42°C (stable at 4^C for several days or weeks); 
resistant to shear forces found in HPLC; insensitive to 
UV; tolerant of desiccation; and resistant to a pH of 
35 2.0 to 10. 0, surface active agents such as SDS or 
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■ ' Triton;, chaotropes such as 4M urea or 2M guanidiniuiti 
HCl, common ions such as KT*", Na"^, and SO4 ^ common 
organic solvents such as ether and acetone , and 
degradative enzymes. Finally^ there must be a suitable. 

5 OCV (see Sec, 3) • 

Preferably, the 3 D structure of the OSP, and the 
sequence of the OSP gene p. 47 are known. If the 3D 
structure is not known, there is preferably knowledge 
10 of which residues are exposed on the cell surface, the 
location of the domain boundaries within the OSP^r 
and/ or of successful fusions of the OSP and a foreign 
insert. The OSP preferably appears in numerous copies 
. on the outer surface of the GP, and preferably serves a 
15 non-essential function. It is desirable that the OSP 
. not be post translationally. processed, or at least that 
" this processing be understood. 

The preferred GP, OCV and OSP are those for which 
2 0 the fewest serious obstacles, can be seen, rather than 
the one that scores highest on any one criterion. 

Next, we consider general answers to the questions 
posed in this step for the cases of: a) vegetatively , 
.25 growing bacterial cells (Sec. 1.1), b) bacterial spores 
(Sec, 1.2) , .and c) (Sec. 1.3) . Preferred OSPs for 
several GPs are given in Table 2 . 

Sec. 1,1; Bacterial Cells as Genetic Packages: 

30 

One may choose any well-characterized bacterial 
strain .which may be grown in culture. The important 
questions in this case are: a) do we know enough about 
mechanisms that localize proteins on the outside of the 
35 cell, b) will the IPBD fold in the .environment of the 
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outer membrane, and c) will cells change expression of 
osp-pbd . derived, from oslP-ipbd . during affinity 
separation? Some- IPBDs may need large or insoluble 
prosthetic groups, such as an Fe4S4 cluster, that are 
available within the cell, but not. in the medium. The 
formation of Fe4S4 clusters found in some ferrodoxins 
is -catalyzed by enzymes found in the cell (B0N085) 1 
IPBDs that require such prosthetic groups may fail to 
fold or function if displayed on bacterial cells. 

Sec. 1,1,1; Preferred Bacterial Cells as GP : 

In view, of the extensive knowledge of coli, a 
strain of coli, defective in recombination, is the 
15 strongest candidate as a bacterial GP. Other preferred 
candidates are Salmonella typhimurium ^ Bacillus 
subtilis , and Pseudomonas aeruginosa . 

Sec. 1>1>2; Preferred Outer Surface Proteins for 
20 Displaying IPBDs on Bacterial Cells; 

Gram-negative bacteria have outer-membrane 
proteins (OMP) , that form a subset of OSPs* Many OMPs 
span the membrane one or more times. The signals that 

25 cause OMPS to localize in the outer membrane are 
encoded in the amino acid sequence of the mature 
protein. Fusions of fragments of omp genes with 
fragments of an s gene have led to X appearing on the 
outer membrane (BENS84, CLEM81) . If no fusion data are 

30 available, then we fuse, an inbd fragment to various , 
fragments of the osx> gene and obtain GPs' that display 
the osp-ipbd fusion on the cell outer surface by 
screening or selection for the display-of -IPBD 
phenotype. 



5 



•6 
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Oliver tias reviewed mechanisms of protein 
secretion in bacteria (0IiXV85 and OLXVS?) . Nikaido and 
Vaara (NIKA87) have reviewed mechanisms by which 
proteins become localized to the outer mendDrane of 
5 Gram-negative bacteria* For example^ the LamB protein 
of Ej^ coli is synthesized with a typical signal- 
sequence which is subsequently removed* Benson et al. 
(BENS84-) showed that LamB-LacZ fusion proteins would be 
deposited in the outer membrane of E^. coli when 
10 residues 1-49 of the mature LamB protein are included 
in the fusion, but that residues 1-43 are insufficient* 

LamB of 'E^ coli is a porin for maltose and. 
maltodextrin transport^ and serves as the receptor for, 

15 adsorption of bacteriophages lambda and KIO. This 
protein has been purified to homogeneity (ENDE78) and 
shown to function as a trimer (PALV79) . Mutations to 
phage resistance have been used to define the parts of 
the LamB protein that adsorb each phage (ROAMS 0, 

20 CLEM81, CLEM83, GEHR87) . 

Topological models have been developed that 
describe the function of phage receptor and 
maltodextrin transport. The models describe these 
25 . domains and their locations with respect to the 
surfaces of the outer membrane (CLEM81;. CLEM83^ CHAI184, 
HEINSS) . 

LamB is transported to the outer membrane if a 
3 0 functional KT-terminal sequence is present; further, the 
first 49 amino acids of the mature sequence are 
required for successful transport (BENS84) . Homology 
between parts of LamB protein and other outer membrane 
proteins OmpC, OmpF and PhoE has been detected 
3 5 (NIKA84) , including homology between LamB amino acids 



3 9-49 and sequences of the other proteins. These 
suubsequences may label the proteins for- transport to 
the outer membrane. Further ^ monpclonal antibodies 
derived from mice immunized with purified LamB, have 
been used to characterize four distinct topological and 
functional regions, two of which are concerned with 
maltose transport (GABAS2) . 

Sec. 1*1.3 Choice of Insertion site for IPBD in 
Bacterial Cell OSP; 

For fusions of the phoA into the coding sequence 
for an integral membrane protein, the PhoA domain is 
localized according to where in the integral membrane, 
protein the phoA gene was inserted (BECKS 3 and MAN0S6) 
That is, if phoA is inserted after an amino acid which 
normally is found in the cytoplasm, then PhoA appears 
in the cytoplasm. If phbA is inserted after an amino 
acid normally found in the periplasm, however, then the 
PhoA domain is localized on the periplasmic side of the 
membrane, and anchored in it, Bec3cwith and colleagues 
(BECKS 8) have extended these observations to the lacZ 
gene that can be inserted into genes for integral 
membrane proteins such that the LacZ domain appears in 
either the cytoplasm or the periplasm according ' to 
where the lacZ gene was inserted, 

OSP-IPBD fusion proteins need not fill a 
structural role in the outer membranes of Gram-negative 
bacteria because parts of the outer membranes are not 
highly ordered. For large. GSPs there is likely to be 
one or more sites at which osp can be truncated and 
fused to ipbd such that cells expressing the fusion 
will display IPBDs on the cell surface. If fusions 
between fragments of osp and x have been shown to 
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display X on the cell surface^ we can design an osr>* 
ipbd gene by substituting ipbd for x in the DN"A 
sequence. Otherwise, successful OMP-IPBD fusion is 
preferably sought by fusing fragments of the best omp 
5 to an ipbd , expressing the fused gene, and testing the 
resultant GPs for display-of-IPBD phenotype. We use 
the available data about OMP to pick the point or 
points of fusion between oxcip and iobd to maximize the 
likelihood that IPBD will be displayed. Alternatively^ 

10 we truncate osp at several sites or in a manner that 
produces osp fragments of variable length and fuse the 
os-p fragments to ipbd ; cells expressing the fusion are 
screened or selected which display IPBDs on the cell . 
surface. An additional alternative is to include short 

15 segments of random DNA in the fusion of omo fragments 
to j-pbd and then screen or select the resulting 
variegated population for members exhibiting the 
display-of-lPBD phenotype. 

20 The promoter for the osp-ipbd gene, preferably, is 

subject to regulation by a small chemical inducer^ such 
as isopropyl thiogalactoside (IPTG) ( lac XJV5 promoter) . 
It need not come from a natural psp gene; any 
regulatable bacterial promoter can be used (IIANI82) . 

25 

Once a genetic packaging system employing 
vegetative bacterial cells has been designed, it is 
time to choose an IPBD (Sec. 2) . 

3 0 Sec. 1.1.4: In Vivo Selection for Pseudo-os-p Gene From 
Random DNA Inserts in Bacterial Cells: 

As an alternative to choosing a natural OSP and an 
insertion site in the OSP, we can construct a gene 
3 5 comprising: a) a regulatable promoter ( e.g. lacUVS) , b) 



a Shine-Dalgarno sequence, c) a periplasmic transport 
signal sequence; d) a fusion of the ipbd gene with a 
segment of random DNA (as in Kaiser et al. (KAIS87) ) , 
e) a stop codon, and f) a transcriptional terminator • 
The random DNA, which preferably comprises 9 0-3 00 
bases / encode nximerous potential OSTS. (EF. KAIS87) 
The fusion of ipbd and the random DNA could be in 
either order, but ipbd upstream is slightly preferred. 
Isolates from the population generated in this way can 
be screened for display of the IPBD, Preferably, a 
version of sele'ction-through-binding is used to select 
GPs that display IPBD • on the . GP surface, and thus 
contain a DNA insert encoding a functional OSTS* 
Alternatively, clonal isolates of GPs may be screened' 
for the display--of-IPBD phenotype. 

The, preference for ipbd upstream of the random DNA 
arises from consideration of the manner in which the 
successful GP( IPBD) will be used. In Part ±11, we will 
introduce nximerous mutations into the pbd region of the 
osp-'obd gene, some of which might include gratuitous 
stop codons. If pbd precedes the random DNA, then 
gratuitous stop codons in nbd lead to no OSP-PBD 
protein appearing on the cell surface. If nbd follows 
the random DNA, then gratuitous stop codons in pbd 
might lead to incomplete OSP-PBD proteins appearing on 
the cell surface. Incomplete proteins often are nbn- . 
specifically sticky so that GPs displaying incomplete 
PBDs are easily removed from the population. 

Sec. 1.2; Displavina IPBD on bacterial spores: 

Bacterial spores have desirable properties as GP 
candidates. ' Bacillus spores neither actively 
metabolize nor alter the proteins on their surface. 
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However^ spores are much more resistant than vegetative 
bacterial cells or phage to chemical and physical 
agents. Spores have the disadvantage that the 
molecular mechanisms that trigger sporulation are less 
5 well worked out than is the formation of M13 or the 
export of protein to the outer membrane of coli. 

Sec, 1^2.1.; Preferred Bacterial Sipores for Use as GPs: 

10 Bacteria of the genus Bacillus form endospores 

that are extremely resistant to damage by heat^ 
radiation^ desiccation^ and toxic chemicals (reviewed 
by Losick et al, C^C)SI86) ) . These spores have complex, 
structure and morphogenesis that is species-specific 

15 . and only partially elucidated. The following 
observations are relevant to the use of Bacillus spores 
as genetic packages* ^ 

Plasmid DNA is commonly included in spores • 
20 Plasmid encoded proteins have been observed on the 
surface of Bacillus spores (DEBR86) . Sporulation 
involves complex temporal regulation that is moderately 
well tinderstood (L0SI86) . The sequences of several 
sporulation promoters are known; coding, sequences 
25 operatively linked to such promoters are expressed only 
during sporulation (RAYC87) . 

Donovan et al . have identified several polypeptide 
components of B^ subtil is spore coat (D0NO87) ; the 

3 0 sequences of two complete coat proteins and amino- 
terminal fragments of two. others have been determined. 
Some , components of the spore are synthesized in the 
forespore, e.g. small acid-soluble spore proteins 
(EKRI88) , while other components are synthesized in the 

35 mother cell and appear in the spore f e.a«. the coat 
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proteins) . This spatial organization of synthesis is 
controlled at the transcriptional level. 

Spores self-assemble, but the signals that cause 
5 various proteins to localize in different parts of the 
spore are not well understood; presumably, the signals 
controlling deposition of the coat proteins from the 
cytoplasm of the mother cell ' onto the spore coat are 
embedded in the polypeptide sequence. Some, but not 

10 all., of the coat proteins are synthesized as precursors 
and are then processed by specific proteases before 
deposition in the spore coat (D0N087) . Viable spores 
that differ only slightly from wild-type are produced 
in Bi. subtil is even if any one of four coat proteins is 

15 missing (DON087) . Disulfide bonds form within the 
spore (thiol reducing agents are needed to solubilize 
several of the proteins of the coat) . The 12kd coat 
protein, CotD, contains 5 cysteines • CotD also 
contains an unusually high number of histidines (16) 

2 0 and prolines (7) . The llkd coat protein, CotC, 

contains only one cysteine and one methionine, CotC 
has a very unusual amino-acid sequence with 19 lysines 
(K) appearing as 9 K-K dipeptides and one isolated K. 
There are also 20 tyrosines (Y) of which 10 appear as 5 
25 Y-Y dipeptides. Peptides rich in Y and K are known to 
become crosslinked in oxidizing environments (DEV07 8 , 
WAIT83, WAIT86) . CotC contains 16 D and E amino acids 
that nearly equals the 19 Ks. There are no A, F, R, I, 
L, N, P, Q, S, or W amino acids in CotC. Neither CotC 

3 0 nor CotD is post-translationally cleaved. The proteins 
GotA and CotB are post-translationally cleaved. 

Endospores from the genus Bacillus are more stable 
than are exospores from Streptomvces . Bacillus 
subtil is f onas spores in 4 to 6 hours, but Streptomvces 



35 



wo 90/02809 



40 



PCr/US89/03731 



species may require days or weeks to sporulafce. In 
addition^ genetic knowledge and manipulation is much 
more developed for B-, subtllis than for other spore- 
forming bacteria. Thus Bacillus spores are preferred 
5 over Streptomyces spores. Bacteria of the genus 
Clostridium also form, very durable endospores, but 
Clostridia^ being strict anaerobes, are not convenient 
to culture. Th^ choice of a species of Bacillus is 
governed by knowledge and availability of cloning 

10 systems and by how easily sporulation can be 
controlled. A particular strain is chosen by the 
criteria listed in Sec, 1.0* Many vegetative 
biochemical pathways are shut down when sporulation . 
begins so that prosthetic groups might not be 

15 available. 

Sec^ 1.2^2 Preferred outer-surface -proteins for 

Displaying IPBD on Bacterial S-pores; 

If a spore is chosen as GP, the promoter is the 
most important part of the osp gene^. because the 
promoter of a spore coat protein is most active: a) 
when spore coat protein is being synthesized and 
deposited onto the spore and b) in the specific place 
that spore coat proteins are being made. In B. 
subtilis, some of the spore coat proteins are post- 
translationally processed by specific proteases. It is 
valuable to know the sequences of precursors and mature 
coat proteins so that we can avoid incorporating the 
recognition sequence of the specific protease into our 
construction of an OSP-IPBD fusion. The sequence of a 
mature spore coat protein contains information that 
causes the protein to be deposited in the spore coat; 
thus gene fusions that include some or all of a mature 
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coat protein sequence are preferred for screening or 
selection, for the display-of-TPBD phenotype. 

Fusions of ipbd fragments to cote or cotD 
5 fragments are likely to cause IPBD to appear on the 
spore surface • The genes cotC and cotD are preferred 
osp genes ; because CotC and CotD are not post- 
translationally cleaved. Subsequences from cotA or 
cotB could also be used to cause ah IPBD to appear on 

10 the surface of B^ subtilis . spores , but we must take the 
post-translatiorial cleavage of these proteins ..into 
account. DNA- . encoding IPBD could' be fused to " a 
fragment of cot A or cotB at either end of the coding 
region or jat sites interior to the coding region- * 

15 Spores could then be screened or selected for the 
display-of-IPBD phenotype. 

To date, no Bacillus sporulation promoter has been 
shown to be inducible by an exogenous chemical inducer 
20 as the lac promoter of E_s_ coli. Nevertheless, the 
quantity of protein produced from a sporulation 
promoter can be controlled by other factors, such as 
the DNA sequence around the Shine-Dalgamo sequence' or 
codon usage. 

25 

Sec. 1.2.3; Choice of Insertion site for IPBD in OSP 
of Bacterial Spore; 

The considerations governing insertion site in the 
3 0 spore OSP are the same as those given in Section 1.1.3. 

Sec, 1.2.4: In Vivo Selection for Pseudo-osp Genes 
From Random DNA Inserts in Bacterial Spores; 
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Although the consicjerations for spores are nearly, 
identical to the considerations for vegetative 
bacterial cells (Sec* 1-1), the available information 
on the mechanisms that cause proteins to appear on 
5 spores is meager so that use of the random-DNA approach 
becomes a more attractive option. 

We can use the approach described above at 1,1.4 
for attaching an IPBD to an coli cell, except that: 
10 a) a sporulation promoter is used^ and b) no 
periplasmic signal sequence should be present. 

Sec. 1>.3: DisTDlaving IPBD on Outer Surface of Phacres: 

15 Sec. 1.3.1: Preferred Phacres for Use as GPs: 

Unlike bacterial cells and spores, choice of a 
phage depends strongly on knowledge of the 3D- structure 
of an OSP and how.it interacts with, other proteins in 
20 the capsid. The size of the phage genome, and the 
packaging mechanism are also important because the 
phage genome itself is the cloning vector. The osp- 
ipbd gene must be inserted into the phage genome; 
therefore: 

25 

• 1) the virion must be capable of accepting the 
insertion or substitution of genetic material, and 

2) the genome of the phage must be small enough to 
3Q allow convenient manipulation. 

Additional considerations in 'choosing phage arer 1) 
the morphogenetic pathway of the phage determines the 
environment in which the IPBD will have opportunity to 
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fold, 2) IPBDs containing essential disulfides may not 
fold within a cell, 3) IPBDs needing large. or insoliible 
prosthetic groups may not fold if secreted because the 
prosthetic group is lacking, and 4) when variegation is 
introduced in Part III, multiple infections could 
generate hybrid GPs that carry the gene for one PBD but. 
have at least some copies of a. different PBD on their 
surfaces; it is preferable to minimize this 
possibility. 

Bacteriophages are excellent candidates for GPs 
because there is little or no enzymatic activity 
associated with intact mature phage, and because the 
genes are inactive outside a bacterial host, rendering 
the mature phage particles metabolically inert. The 
filamentous phage M13 and bacteriophage PhiX174 are of 
particular interest* 

Filamentous phage ; 
20 

The entire life cycle of the filamentous" phage 
M13, a common cloning and sequencing vector, is well 
. understood. M13 and fl ate so closely related that we 
consider the properties of each relevant to both 

25 (RASC86) ; J any differentiation is for historical 
accuracy. The. genetic structure (the complete sequence 
(SCHA78),r the identity 'and function of the ten genes, 
and the order of transcription and location of the 
promoters) of M13 is well known as is the physical 

30 structure of the virion (BANN81, BOEKsd, CHAN79, 
ITOK79, KAPL78, KUHN85b, KUHN87, MAKOSO,. MARV78, 
MESS78, 0HKA81, RASCS6 , RUSS81, SCHA78, SMIT85, WEBS78 , 
and ZIMM82); see RASC8 6 for a recent review of the 
structure and function of the coat proteins. 

35' . 
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Relevant facts about MIS are disclosed in Example" 

I. 

Bacteriophacye PhiX174 ; 

5 

The bacteriophage PhiX174 is a very small 
icosahedral viirus which has been thoroughly studied by 
genetics^ biochemistry, and electron microscopy (See 
The Single-Stranded DNA Phages (DENH78)). To date, no 

10 proteins from PhiX174 have been studied by X-ray 
diffraction. PhiX174 is not used as a cloning vector 
because PhiX174 can accept almost no additional DNA; 
the virus is so tightly constrained that several of its . 
genes overlap. Chambers et al^ ' CCHAM82) showed that 

15 mutants in gene G are rescued by the wild-type G gene 
carried on a plasmid so that the host supplies this 
protein. 

Three gene products of PhiX174 are present on the 
2 0 outside of the mature virion: F (capsid) , G (major 
spike protein, 60 copies per virion) , and H (minor 
spike protein, 12 copies per virion) • The G protein 
comprises 175 amino acids, while H comprises 328 amino 
acids. The F protein interacts with the single- 
25 stranded DNA of the vims. The proteins F, G, and H 
. are translated from a single mRNA in the viral infected 
- cells. 

Large DNA Phages 

30 

Phage such as lambda or T4 have much larger 
genomes than do MIS or PhiX174 . Large genomes are less 
conveniently manipulated than small genomes, -A phage 
with a large genome, however, could be used if genetic 
35 manipulation is . sufficiently convenient. Phage such as 
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lambda and T4 have more complicated 3D capsid 
structures than M13 or PhiX174, with more OSPs to 
choose from. Phage lambda virions and phage T4 virions 
form intracellularly, so that IPBDs requiring large or 
5 insoluble prosthetic groups might fold on the surfaces 
of these phage • Phage lambda and phage T4 are not 
preferred, however, derivatives of these phages could 
be constructed to overcome these disadvantages. 

10 ENA Phacres 

UNA phage, such as Qbeta, are not ' preferred 
because manipulation of RNA is much less convenient 
than is the manipulation of DNA. Although competent* 
15 PNA bacteriophage are not preferred, useful genetically 
altered RE^A-containing particles could be derived from 
RNA phage, such as MS 2. 

To use MS2 as a GP, we would need to eliminate 
most of the natural viral genome so that- an osr>"ipbd 
geiie could fit into the protein capsid. It is known, 
that the A protein binds sequence-specif ically to a 
site at the 5' end of the + RNA strand triggering 
formation of" RNA--containing particles if coat protein 
is present. If a message containing the A protein 
binding site and the gene for a chimera of coat protein 
and a PBD were produced in a cell that also contained A 
protein arid wild-type coat protein (both produced from 
regulated genes on a plasmid) , then the RNA coding for 
the chimeric protein would get packaged. A package 
comprising RNA -encapsulated by proteins encoded by that 
RNA satisfies the major criterion that the genetic 
message inside the package specifies something on the 
outside. The particles by themselves are' not viable. 
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After isolating the packages that carr^ an SBD ^ we. 
would need to: 

1) separate the RHA from the protein capsid^ 

5 , 

2) reverse transcribe the RNA into DNA, using AMV 
or IMTV reverse transcriptase^ and 

3) amplify the DNA by several cycles of polymerase 
10 chain reaction (PGR) until there is enough to 

subclone the recovered genetic message into .a 
plasmid for sequencing and further work. 

Alternatively / helper phage could be used to rescue the 
15 isolated phage. 

Sec. 1^3.2: Preferred Outer-Surface Proteins for 
Displaying XPBDs on Phages: 

20 For a given bacteriophage^ the preferred* OSP is 

usually one that is present on the phage surface in the 
largest number of copies, as this allows the greatest 
flexibility in varying the ratio of OSP-IPBD to wild 
type OSP and also gives the highest likelihood of 

25 obtaining satisfactory affinity separation. Moreover, 
a protein present in only one or a few copies usually 
performs an essential ftinction in morphogenesis or 
infection; mutating such a protein by addition or 
insertion is likely to result in reduction in viability 

30 of the GP. 

It is preferred that the wild-type osr> gene be 
preserved. The ipbd gene fragment may be inserted 
either into a second copy of the recipient osp gene or 
35 into a novel engineered osp gene. The preferred OSP 
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for use when the GP is M13 is the gene III protein (see 
Example 1) . 

■ Sec. 1.3.3; Choice of Insertion site for IPBD in OSP: 

The :User must choose a site in the candidate OSP 
gene for inserting a' ipbd gene fragment. The coats of 
most bacteriophage are highly ordered. Thus in 
bacteriophage, unlike the cases of bacteria and spores, 
it is important to retain most or all of the residues 
of the- parental - OSP in engineered OSP-IPBD fusion 
proteins. A preferred site for insertion of the ipbd 
gene into th6 phage osp . gene is one in which.' a) the 
IPBD folds into its original shape, b) the OSP domains 
fold into their original shapes, . and c) there is no 
interference between the two domains. 

If there is a 3D model of the phage that indicates 
that either the amino or carboxy terminus of an OSP is 
exposed to solvent, then the exposed terminus of that 
mature OSP becomes the prime candidate for insertion of 
the ipbd gene. A low resolution 3D model suffices. 

In the' absence of a 3D structure, the amino and 
carboxy termini of the mature OSP are the best 
candidates for insertion of the ipbd gene. , A 
functional fusion may require additional residues 
between the IPBD and OSP domains to avoid unwanted 
interactions between the domains ^ Random-sequence DNA 
or DNA coding for a specific sequence of a protein 
homologous to the IPBD or OSP, can be inserted between 
the osp fragment and the ipbd fragment if needed. 

Fusion at a domain boundary within the OSP is also 
a good approach, for obtaining a functional fusion. 
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Smith exploited such a boundary when subcloning 
heterologous DNA into gene III of fl (SMIT85) • 

There are several- inethods of identifying domains, 
5 Methods that rely on atomic coordinates, have been 
reviewed by Janin and Chothia (JANI85) see also R0SE85, 
RASH84, VITA84, PAB079, POTE83, and SCOTS 7. 

Xf the only structural information available is 
10 the amino acid sequence of the candidate OSP, we use 
the sequence to predict turns and loops. There is. a 
high probability that some of the loops and turns will 
be correctly predicted ( cf , Chou and Fasman, (CH0U72));. 
these locations are also candidates for insertion of 
15 the iobd gene fragment. 

Sec, 1.3.4: In Vivo Selection for Pseudo-OSP Gene from 
Random DNA Inserts in Bacterial S--pores: 

20 Alternatively, a functional insertion site may be 

determined by generating a number of recombinant 
constructions and selecting the functional strain by 
phenotypic characteristics. Because the OSP-IPBD must 
fulfill a structural role in the phage coat, it is 

25 unlikely that any particular random DNA sequence 
coupled to the ipbd gene will produce a fusion protein 
that fits into the coat in a functional way. 
Nevertheless, random DNA inserted between large 
fragments of a coat protein gene and the inbd gene will 

3 0 produce a population that is likely to contain one or 
more members that display the IPBD on the outside of a 
viable phage. A display probe, similar to that defined 
in 1.1,4, is constructed and random DNA sequences 
cloned into appropriate sites. 
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Sec, 2; Choice of IPBD' : 

An IPBD may be chosen from naturally occurring 
proteins or domains of naturally occurring proteins, or 
may be designed from first principles, A designed 
protein may have . advantages over natural proteins if: 
a) the designed protein is more stable^ b) the designed 
protein is smaller, and c) the charge distribution of 
the designed protein can be specif ied oaore freely, 

A candidate IPBD must meet the following criteria: 
1) stablility under the conditions of its intended use 
(the domain may comprise the entire protein that will 
be inserted, e.g. BPTI),,2) Jcnowledge of the amino acid 
sequence is obtainable, 3) identification of the 
residues- on the outer surface, and their spatial 
relationships, and 4) availability of a molecule, 
AfM(IPBD) having high specific affinity for the' IPBD/ 

.20 Preferably, the XPBD is no larger than necessary 

because it is easier to arrange restriction sites in 
smaller amino-acid sequences* The usefulness of 
candidate IPBDs that meet all of these requirements 
depeiids on the " availability of the information 

25 discussed below. 

Information used to. judge IPBD suitability 
includes: 1) a .3D structure (knowledge strongly 
preferred), 2) one or more sequences homologous to , the 

30 IPBD (the more homologous sequences known, the better), 
3) the pi of the IPBD (knowledge necessary in some 
cases), 4) the stability and solubility as a function 
of temperature, pH and ionic strength (preferably known ' 
to be stable over a wide range and soluble in 

3 5 conditions of intended use) , 5) ability to bind metal 



^ 

10 



wo 90/02809 



PCr/US89/0373I 



50 

ions such as Ca'^"^ or Mg*^"^ (loiowledge preferred; binding 
per no preference) , 6) enzymatic activities, if any 

Clcnowledge preferred^ activity per se has uses- hut may 
cause problems), 7) binding properties, if any 
5 Cknc>^l^<^9f^ preferred, specific binding also preferred) , 
8) availability of a molecule having specific and 
strong affinity ( ^ < lO""!^ M) for the IPBD 
(preferred) , 9) availability of a molecule having 
specific and laediiim affinity ( 10"^ M < < 10"^ M) 
10 for the XPBD (preferred) ^ 10) the sequence of a mutant 
of IPBD that does not bind to the affinity molecule (s) 
(preferred) , and 11) absorption spectrum in visible, 
UV, NMR, etc» (characteristic absorption preferred) . 

15 . If only one species of molecule having affinity 
for IPBD (AfM(IPBD)) is available, it will be used to: 
a) detect the IPBD on the GP surface, b) optimize 
expression level and density of the affinity molecule 
on the matrix (Sec, 10.1), and c) determine the 

2 0 efficiency and sensitivity of the affinity separation 
(Sees. 10.2 and 10,3). As noted above, however, one 
would prefer to' have available two species of 
AfM(IPBD) , one with high and .one with moderate affinity 
for the IPBD, The species with high affinity would be 

25 used in initial detection and in determining efficiency 
and sensitivity (10 . 2 and 10 , 3 ) , and the species with 
moderate affinity would be used in optimization (10*1) ♦ 

For at least 20 candidate IPBDs the above 
30 information is available or is practical to obtain,., for 
example, , bovine pancreatic trypsin inhibitor (BPTI,. 58 
residues) , crambin (4 6 res"idues) third domain of 
ovomucoid (56 residues) , T4 lysozyme (164 residues) , 
and azurin (128 residues) • 
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Hos-b of the PBDs derived from a PPBD according to 
the process of the present invention affect residues 
having side groups directed toward the solvent. 
Exposed residues can accept a wide range of amino 
5 acids, while buried residues are more limited in this 
regard (REID88) . Surface mutations typically have only 
small effects on melting temperature -of the PBD, but 
may reduce the stability of the PBD. Hence the chosen 
IPBD should have a high melting temperature (60^C 

10 acceptable, the higher the better) .and be stable over a 
wide pH range '(8.0 to 3.0 acceptable ; 11.0 to 2.0 
preferred), so that the, SBDs deprived from the chosen 
IPBD by mutation and selection-through-binding will 
retain sufficient stability. Preferably, the 

15 substitutions in the IPBD yielding the various PBDs do 
not reduce the melting , point of the domain below 50^C. 

Two general characteristics of the target 
molecule, size and charge, make certain . classes of 

20 IPBDs more likely than other classes to yield 
derivatives that will bind specifically to the target. 
Because these are very general characteristics, one can 
divide all targets "into six classes: a) large positive, 
b) large neutral, c) large negative, d) small positiye, 

25 e) small, neutral, and f) small negative. A small 
collection of IPBDs, one or a few corresponding to each 
class of : target, will contain a preferred candidate 
IPBD for any chosen target* 

3 0 Alternatively, the user may elect to engineer a 

GP( IPBD) for a particular target; Sec 2.1 gives, 
criteria that relate target size and charge to the 
choice of IPBD. 

35 ' Sec. 2.1: Influence of target size on choice of IPBD; 
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If the target is a protein or other macromolecule 
a preferred embodiment of the IPBD is a small protein 
such as BPTI from Bos taums (58 residues) ^ crambin 
5 ' from rape seed (46 residues) r or the third domain of 
ovomucoid from Coturnix coturnix J aoonica (Japanese 
quail) (56 residues) (PAPA82) ^ because targets from 
this class have clefts and grooves that can accommodate 
small proteins in highly specific ways. If the target 
10 is a macromolecule lacking a compact structure, such as 
starch, it should be treated as if it were a small 
molecule. Extended macromolecules with defined 3D 
structure,, such as collagen, should be treated as large, 
molecules. . 

15 \ 

If the target is . a small molecule, such as a 
steroid,, a preferred embodiment of the IPBD is a 
protein the size of ribonuclease from Bos taurus (124 
residues), ribonuclease from Aspergillus oryzae (104 

20 .residues) , hen egg white lysozyme from Gallus aallus 
(129 residues), azurin from Pseudomonas aerucrinosa (128 
residues), or T4 lysozyme (164 residues), because such 
proteins have clefts and grooves into which the small 
target molecules can fit. The Brookhaven Protein Data 

25 Bank contains 3D structures for these proteins. Genes 
encoding proteins as large as T4 lysozyme can be 
manipulated by standard techniques for the purposes of 
this invention, 

30 If the target is a mineral, insoluble in water, 

one must consider the nature of the mineral's molecular 
surface. Smooth surfaces, (such as crystalline 
silicon) require medium to large proteins (such as 
ribonuclease) as IPBD in order to have sufficient 

35 contact area and specificity. Rough, grooved surfaces 
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(zeolites) , could be bound either by small proteins 
(BPTI) or larger proteins (T4 lysozyiae) . 

Sec, 2«2; Influence of target charge on choice of 
IPBD; ^ 

Electrostatic repulsion between molecules of liXe 
charge can prevent molecules with highly complementary 
surfaces from binding* Therefore, it is preferred 
that, under the conditions of intended use, the IPBD 
and the. target 'molecule either have opposite charge or 
that one of them is neutral. Inclusion of counter ions 
can reduce or eliminate electrostatic repulsion* 

Sec. 2>3: Other aspects of choice of IPBD: 

If the chosen IPBD is an enzyme, it may be 
necessary to change one , or more residues in the active ' 
site to inactivate enzyme function. For example, if 
the IPBD were T4 lysozyme and the GP were E^, coli cells 
or M13 , we would inactivate the lysozyme lest it lyse 
the cells. If, on the other hand, the GP were PhiX174, 
then inactivation of lysozyme may not be needed because 
T4 lysozyme can be overproduced inside E^. coli cells 
without (Jetrimental effects and PhiX174 forms 
intracellularly. It is preferred to inactivate enzyme 
IPBDs that might be hamf ul to the GP or its host by 
substituting mutant amino acids at one or more residues 
of the active site* It is permitted to, vary one or 
more of the residues that were changed to abolish the 
original enzymatic activity of the IPBD* Those- CPs 
that receive osp-obd genes encoding an active enzyme 
may die, but the majority of sequences will not be 
deleterious. 
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Sec. 3; Choice of QCV: 

The OCV is preferably small, e.g. ^ less than 10 
KB. It is desirable that cassette mutagenesis be 
5 practical in the OCV; preferably, at least 25 
restriction enzymes are available that do not cut the 
OCV, It is likewise " desirable that single-stranded 
mutagenesis be practical. Finally,- the OCV preferably 
carries a selectable marker. ^ A suitable OCV is 

10 obtained or is , engineered by manipulation of available 
vectors » Plasmids are preferred over the bacterial 
chromosome because genes on plasmids are much more 
easily constructed and mutated than are chromosomal 
genes. When bacteriophage are to be used, the osr>""ipbd 

15 gene must be inserted into the phage genome. 

For phage such as M13, an antibiotic resistance 
gene is engineered into the genome (HINE80) . More 
vinrulent phage, such as PhiX174 , make discernable 

20 plaques that can be picked, in which case a resistance 
gene is not essential; f urthexinor e , there Is no room in 
the PhiX174 virion to add any new genetic material. 
Inability to include an antibiotic resistance gene is a 
disadvantage because it limits the number of GPs that 

25 can be screened. 

It is preferred that GP(IPBD) carry a selectable 
marker not carried by wtGP. It is also preferred that 
wtGP carry a selectable marker not carried by GP(IPBD) . 

30 

Sec. 4: Desianincr the osp-ipbd gene Insert: 

We design an amino acid sequence that will cause 
35 the IPBD to appear on the GP surf ace . when it is 
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expressed. This amino acid sequence may determine the 
entire coding region of the osp-ipbd gene, . or it may 
contain only the ipbd sequence adjoining restriction 
sites into which random DNA will be cloned (Sec, 6,2) . 

The actual gene may be produced by any means. The 
pbd segment, derived from the iobd segment, must be 
easily genetically manipulated in the ways described in 
Part III. Synthetic ipbd segments are preferred 
because they allow greatest control over placement of 
restriction sites. 

Sec. 4,1 Genetic regulation of the osp-iobd gene; . • 

Regarding regulation of the osp-ipbd gene, tlie two 
important questions are: a) how much OSP-IPBD do we 
need on each GP, and b) how accurately must we regulate 
the amount? 

The essential function of the affinity separation 
is to separate GPs that bear PBDs (derived from IPBD) 
having high affinity for the target from GPs bearing 
PBDs having low affinity for the target. If a gradient 
of some solute, such as increasing salt, changes the 
conditions, then all weakly-binding PBDs will cease to 
bind before any strongly-binding PBDs cease to bind. 
Regulation of the osp-obd gene must be such, that all 
packages display sufficient PBD to effect a good 
separation in Sec 15, If the amount of PBD/GP had an 
effect on the elution volume of the ; GP from the 
affinity matrix, then we would . need, to regulate the 
amount of PBD/GP accurately. The following analysis 
shows that there is no strong linear effect of IPBD/GP 
on elution volume and assumes only: a) that all GPs are 
the same size, b) that interactions between the PBDs " 
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and the affinity matrix dominate differential elution 
of GPs, c) that the system is at equilibrium^ and d) 
that all PBDs on any one GP are identical. 

Xf Np identical PBDs on a GP each have access to 
target molecules, and each PBD has a free-energy of 
binding to the target of delta Gj^, then the total free 
energy of binding is 

delta Gy^'^^^ = Up * delta Gj^ • 

Delta is a function of parameters of the solvent, 
such as: 1) concentration of ions, 2) pH, 3). 
temperature, 4) concentration of neutral solutes such 
as sucrose, glucose, ethanol, etc. , 5) specific ions, 
such as, calcium, acetate, benzoate, nicotinate, etc> 
If conditions are altered during affinity separation so 
that delta G^ approaches sero, delta Gjj^^^ approaches 
zero Np times faster. As 'delta Gj^''^*^'^ goes to or above 
zero, the packages will dissociate from the immobilized 
target molecules and be eluted. 

GPs bearing more PBDs have a sharper transition 
between bound and unbound than packages with fewer of 
the same PBDs, For equilibrium conditions, the mid- 
point of the transition is determined only by the 
solution conditions that bring the individual 
interactions to zero free-energy. The number of 
PBDs/GP determines the sharpness of the transition. 

It should also be noted that the number of PBDs/GP 
is usually influenced by physiological conditions so 
that a sample of genetically identical GP(PBD)s may 
contain GPs having different numbers of PBDs on the GP 
surface. In a population of GP(vgPBD)s each PBD 
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sequence will appear on more that one GP, and the 
actual number of PBDs/GP will vary from GP to GP within 
some range. Within a ■ variegated population of PBDs, 
let PBDj^ be the PBD with laaximum affinity for the 
target. If there is a linear effect on elution volume 
of number of PBDs/GP, then the GPs having the greatest 
number of PBD^^ will be most retarded on the column. 
When we culture the enriched population the GP (PBDj^) 
will be amplified and give rise to new GP(PBDjr)s having 
varying numbers of PBD;5^/GP. Thus the affinity 
separation process of the present invention could 
tolerate a linear effect of number of PBDs/GP on the 
elution volume of the GP(PBD) unless strong binding to 
target fortuitously causes the PBD to be displayed on 
the GP only in low number. 

Since there is nO: linear effect on elution volume ' 
from the number of IPBDs/GP, need for highly accurate 
regulation of IPBD/GP is not anticipated. Reproducible 
gene expression is more easily controlled using 
regulated rather than constitutive genetic elements. 
The" analysis above assumes that GP(IPBD)s are in 
equilibrium between solution in buffer and bound to the 
affinity matrix. Rate of elution may be an important 
parameter in column affinity chromatography. In batch 
elution. from an affinity matrix or elution from an 
affinity plate^ the time that each buffer is in contact - 
with the affinity material may be an important 
variable. The density of affinity molecules on the 
matrix is an important variable in optimizing the 
affinity separation. Because the analysis above is 
qualitative, in Sec, 10 of the preferred embodiment we 
experimentally optimise: 1) the density of IPBD on the 
GP surface^ 2) the density of affinity molecules on the 
affinity matrix, 3) the initial ionic strength, 4) the 
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elTition rate, and 5) the quantity of GP/ (volume ot 
matrix) to be loaded on the colmm- 

Transcriptional regulation of gene expression is 
5 best understood and most effective so ve focus our 
attention on the promoter- A number of promoters are 
known that can be controlled by specific chemicals 
added to the culture medium. For example , the lacirvs 
promoter is induced if isopropylthiogalactoside is 

10 added to the culture medium, for example, at between 
1.0 \m and 10.0 mM. Hereinafter, we use "XINDUCE" as. a 
generic term for a chemical that induces expression of 
a gene. Xf transcription of the osp-inbd gene is. 
controlled by XINDUCE, then the number of OSP-IPBDs per 

15 GP increases for increasing concentrations of XINDUCE 
xmtil a fall-off in the number of viable packages is 
observed or until sufficient IPBD is observed on the 
surface of harvested GP(IPBD)s. 

20 The attributes that affect the maximtaa number of 

OSiP-IPBDs per GP are primarily structural in nature. 
There* may be steric hindrance or other unwanted 
interactions between IPBDs if OSP-IPBD is substituted 
for every wild-type OSP. Excessive levels of OSP-IPBD 

25 may also adversely affect the solubility or 
morphogenesis of the GP. For cellular and viral GPs, 
as few as five copies of a protein having affinity for 
another immobilized molecule have resulted in 
successful affinity separations (FEPE82a, FERE82b, and 

30 SMIT85) . 

Another consideration of promoter regulation is 
that it is useful later to know the range of regulation 
of the osp-jpbd , (Sec. 8) In particular, one should 
35 determine how nearly the absence of XINDUCE leads to 
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the absence of IPBD on the GP surface; a nbn--leaky 
promoter is preferred. Ncn-leakiness is useful:, a) to 
show that affinity of GP( osp-ipbd ) s for AfM(IPBD) is 
due to the osTD-ipbd gene^ and b) to allow growth of 
GPf osp-pbd ) in the absence of XINDUCE if the expression 
of osTD-t)bd is disadvantageous. The lacUV5 promoter in 
conjunction with the LacI^ repressor is a preferred 
example. 

Sec. 4-2; DNA sequence design; 

The present invention is not limited to a single 
method of gene design. The* following procedure is an 
example of one method of gene design that fills the 
needs of the present invention i 

If the amino-acid sequence of OSP-IPBD is a 
definite sequence, then the entire gene will be 
constructed (Sec- 6.1) . If random DNA is to be fused 
to ipbd > then a "display, probe" is constructed first; 
the random DNA . is then inserted to complete, the 
population of putative osp-ipbd crenes (Sec, 6.2) from 
w;hich a functional osp - ipbd , gene is. identified by in 
vivo selection or kindred techniques. 

One may use any genetic engineering method to 
produce the correct gene fusion, so long as one can 
easily and accurately direct mutations to specific 
sites in the pbd DNA subsequence (Sec. 14.1). For the 
methods of mutagenesis considered here, however, the 
DNA sequence for the osp-ipbd gene ..must be different 
from any other DNA in the OCV. The degree and nature 
of difference needed' is deteormined by the method of 
mutagenesis . One replaces subsequences coding for the 
PBD with vgDNA, then subsequences to be mutagenized 
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must, be bounded by restriction sites that are unique 
within the OCV. If single-stranded-oligonualeotide- 
directed mutagenesis is to be used, then the DITA- 
sequence of the subsequence coding for the IPBD must be 
5 unique within the OCV, 

Regulatory elements include: a) promoters^ b) 
Shine-Dalgarno sequences, and. c) transcriptional 
terminators, and may be isolated from nature or 
10 designed from ^ knowledge of consensus sequences of 
natural regulatory regions. 

The coding portions of genes to be synthesized are. 
designed at the protein level and then encoded in DNA, 

15 The amino acid sequences are chosen to achieve various 
goals, including: , a) display of a IPBD on the surface 
of a GP, b) change of charge on a IPBD, and c) 
generation of a population of PBDs from which to select 
an SBD. The ambiguity in the genetic code is e^^loited " 

20 to allow optimal placement of restriction sites and to 
create various distributions of amino acids at 
variegated codons* 

Sec. 4,3; Specific DNA sequence assignment: 
25 ^ 

A computer program may be used to identify all 
possible ambiguous DNA sequences coding for an amino- 
acid sequence given by the user and to identify places 
where recognition sites for site-specific restriction 
30 enzymes could be provided without altering the amino- 
acid sequence. 

Restriction sites are positioned within the ost>- 
ipbd gene so that the longest segment between sites is 
35 as short as possible. Enzymes the produce cohesive 
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ends are preferred. The , codon preferences of tlie 
intended . host and^ the secondary .'structure of * the 
messenger RNA are also considered. 

5 Sec> 5.1; Organization of gene synthesis; ■ 

An established strategy for gene synthesis is to 
synthesize both strands of the entire gene in 
overlapping segments of 20 to 50 nucleotides (nts) 

10 (THER88) . We prefer an alternative method that is more 
suitable for synthesis of vgDNA, Our method differs 
from previous methods (OLIP86, 0LIP87, AUSU87) in that 
we: a) use two synthetic strands, and b) do not cut the 
extended DNA in the middle. Our goals are: a) to* 

15 produce longer piieces of dsDNA than can be synthesized 
as ssDNA on commercial DNA synthesizers.^ and b) to 
produce strands complementary to single-stranded vgDNA. 
By using two synthetic . strands, we remove the 
requirement for a palindromic sequence at the 3' end« 

20 

DNA synthesizers can produce oligo-nts of up to 
100 nts in reasonable yield, ^dnh " 100* The 
parameters N^ (the length of overlap needed to obtain 
efficient annealing) and Ng (the number of spacer bases 
25 needed so that a restriction enzyme can cut near the 
end of blunt-ended dsDNA) are determined by DNA and 
enzyme chemistry. N^ 10 and Ng == 5 are reasonable 
values. 

30 We divide the DNA setjuence to be synthesized into 

two nearly equal parts, each 5-8 bases longer than half 
the total length, so that there is an overlap between 
the two parts of 10 to 16 bp (Nw) containing no 
variegated bases. The overlap preferably, is not 

35 palindromic and has high GC content. We synthesize the 
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overlap portiion and the 5' extension of each strand. 
TWien these strands are annealed and completed vith 
Klenow enzyme and all four NTPs, we obtain the desired 
sequence as blunt- ended dsDNA. If the DNA is to be 
5 ligated to other DNA having cohesive ends, five to ten 
(Ns) bases are added to that end. The synthetic dsDNA 
can then be cut efficiently with an appropriate 
restriction enzyme (0LIP87) . 

10 Because^ ^PNA rigidly fixed at 100, the 

current limits of 190 (-2 Mj^jjj^ - 1%) nts overall and 
100 in each fragment are not rigid, but can be exceeded 
by 5 or 10 nts. Going beyond the limits of 190 and 100, 
will lead to lower yields, but these may be acceptable 

15 in certain cases. 

Sec> 5-2: DNA synthesis and -purification methods : 

The present invention is not limited to any 

2 0 particular method of DNA synthesis or construction. 

In the preferred embodiment, DNA is synthesized by 
standard means on a Hilligen 7500 DNA synthesizer. Th.e 
Milligen 7500 has seven vials from which 
25 phosphor amidites, may be taken* Normally, the first 
four contain A, C, T, and G. The other three vials 
may contain unusual bases such as inosine or mixtures 
. of bases, the so-called "dirty bottle". The standard 
software allows programmed mixing of two, three, or 

3 0 four bases in eguimolar quantities, 

-The present invention is not limited to any 
particular method of purifying DNA for genetic 
engineering. Agarose gel electrophoresis and 
35 electroelution on an IBI device (International 
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Biotechnologies, Inc, , New Haven, CT) is, preferably, 
used to purify large dsDNA fragments. For oligo-nts, 
PAGE and electroelution with an Epigene device (Epigene 
Corp., Baltimore, MD) are an alternative to HPLC. 

.5 

Sec. 6.1: Cloning of Known OSP-ipbd ge ne into OCV; 

In the preferred method, the synthetic gene is 
constructed using plasmids that are transformed into 

10 bacterial cells by standard methods CMANIS2, p250) or 
slightly modified standard methods. Alternatively, DNA 
fragments derived from nature are operably linked to 
other fragments of DNA derived from nature or to 
synthetic DNA fragments. In most cases of the 

15 preferred method, gene synthesis involves construction 
of a series of plasmids containing larger and larger 
segments of the complete gene. 

Sec. 6.2 Cloning of Bandom DNA f Pote ntial oso) Into 

20 Display Probe: 

If random DHA^ and phenotypic selection or . 
screening are used to obtain a GP(IPBD) , then we clone 
random DNA, into one of the restriction sites that was 
25 designed into the display probe. 

- The random DNA may be obtained in a variety of 
ways. Degenerate synthetic DNA is one possibility. 
Alternatively, pseudorandom DNA may be taken from 
30 nature. : If , for example, an St)h I site (GCATG/C) has 
been' designed into the display probe at one end of the 
. ir>bd fragment, then we would use Nla III (CATG/) to 
partially digest DNA that contains a wide variety of 
sequences, generating a wide variety of fragments with 
35 CATG 3' overhangs. Preferably, ' the display probe has 
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different: res-triction sites at each end of the lyihd 
gene so that random DNA can be cloned at either end. 

A plasmxd carrying the display probe is digested 
5 with the appropriate restriction enzyme and the 
fragmented,, random DNA is annealed and ligated by 
standard laethods. The ligated plasmids are used to 
transform cells that are grown and selected for 
expression of the antibiotic-resistance gene, Plasmid-" 
10 bearing GPs are then selected for the display-of-IPBD * 
phenotype by the procedure given in Sec. 15 of the 
, present invention using AfM(IPBD) as if it were the 
target* Sec. 15 is designed to isolate GP(PBD)s that 
bind to a target from a large population that do not 
15 bind. 

Sec 7: Harvest of GPs 

Cells are transformed with ligated OCVs and 
2 0 selected for uptake of OCV after an appropriate 
incubation with an agent appropriate to the selectable 
markers on the OCV, GPs are harvested by methods 
appropriate to the GP at hand^ generallyjr 
centrifugation to pelletize GPs and resuspension . of the 
25 pellets in sterile medium (cells) or buffer (spores or 
phage) , 

Sec. 8; Verification of Display Strategy: 

30 The harvested packages are now tested for display 

of IPBD on the surface; any ions or cof actors known to 
be essential for the stability of IPBD or 'AfM(IPBD) 
must -be included at appropriate levels* The tests can 
be done: a) by affinity labeling, b) enzymatically, c) 

35 spectrophotoitietrically/ d) by affinity separation, or 
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e) by affinity precipitation. The AfM(IPBD) in tliis 
step is. one picked to have strong affinity 
(preferably, % < 10"^^ M) for the IPBD molecule and 
little or no affinity for the wtGP. For example, , if 
BPTI were the IPBD, trypsin, anhydrotrypsin, or 
antibodies to BPTI could be used as the AfM(BPTI) to 
test for the presence of BPTI. Anhydrotrypsin, a 
trypsin derivative with serine 195 converted to 
dehydroalanine, has no proteolytic activity but retains 
its affinity for BPTI (AKOH72 and HUBE77) . 

Preferably, the presence of the IPBD on the 
surface of the GP is demonstrated through the use of a 
soluble, labeled derivative of a AfM(IPBD) with high 
affinity for IPBD. The labeled derivative of AfM(IPBD) 
is denoted as AfM(IPBD)*. 

If random DHA has been used, then the procedures 
of Sec. 15 are used to obtain a clonal isolate that has 
the display-of-IPBD phenotype/ Alternatively, clonal 
isolates may be screened for the display-of-IPBD 
phenotype. The tests of this step are applied to one 
or more of these clonal isolates* 

If no isolates that bind to the affinity molecule 
are obtained we take corrective action as disclosed in 
Sec, 9- 

If one or more of the tests indicates that the. 
IPBD is displayed on the GP surface, we verify that the 
binding of molecules having known affinity for IPBD is 
due to the chimeric osp-iTPbd gene through the use of 
standard genetic and biochemical techniques, such as: 
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1) transferring the os-p-ipbd gene into the parent. 
GP to verify that osp-ipbd confers binding, 

2) deleting the osp-it>bd gene from the isolated GP 
5 to verify that loss of osp-ipbd causes loss of 

binding, 

3) - showing that binding of GPs to AfMCIPBD) 
correlates with [XINDUGE] (in those cases that 

10 expression of osTP-j-pbd is controlled by 

[XINDUGE]), and 

4) showing that binding of GPs to AfM(IPBD) is 
specific to the immobilized AfM(IPBD) and not to 

IS the support matrix- 



Presence of IPBD on the GP surface is indicated by 
a strong correlation between [XINDUGE] and the 

2 0 reactions that are linear in the amount of IPBD (such 

as: a) binding of GPs by soluble AfKCIPBD)*, b) 
absorption caused by IPBD, and c) biochemical reactions 
of IPBD), The * demonstration (4) that binding is to 
AfMCIPBD) and the genetic tests (1) and (2) are 

25 important; the test with XINDUGE (3) is less so. 

We sequence the relevant ipbd gene fragment from 
each of several clonal isolates to determine the 
construction • 



30 



We establish the maximum salt concentration and pH 
range for which the GP (IPBD) binds the. chosen 
AfMCIPBD) * 
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If the IPBD. is displayed on the outside of the GP, 
and if that display is clearly caused by the intTOduaed 
osE-ipbd gene, ve proceed to Part II, otherwise we Kiust 
analyze the result and adopt appropriate corrective 
5 measures . 

Sec. 9 1 Perfecting the Display System; 

If we have attempted to fuse an ipbd fragment to a 
10 natural osp fragment, our options are : 

1) pick a diff erent , fusion to the same osp by 

a) using opposite end of osp , 

b) keeping more or fewer residues from osp in' 
15 the fusion; for example, in increments of 3 

- or 4 residues, 

c) trying a Jcnown or predicted domain 
boundary, 

d) trying a predicted loop or turn position, 

20 

2) pick a different osp . or 

3) switch to random DNA method, 

25 If we^ have just tried the random DNA method 

unsuccessfully, our options are : 

.1) -choose a different relationship between ipbd 
fragment and random DNA f ipbd first, random DNA 
30 second or vice versa ) , 

2) try a different degree of partial digestion, a 

different enzyme for partial digestion, a 

different degree of shearing or a different source 
35 of natural DNA, or 
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3) switch to the natural OSP metliod. 

If all reasonable OSPs of the current GP have been 
tried and the random DNA method has been tried both 
without success,, we pick a new GP. 

Part II 

Sec> 10.0: Affinity Separation Means: 

In Part II- we optimize an affinity separation 
system that will be used in Part III to enrich a, 
population of GP CvgPBD)s for those GP(PBD)s that 
display PBDs with increased affinity for the target. 

Affinity chromatography is the preferred means ^ 
but FACS, electrophoresis, or other means may also be 
used. 

Sec. 10,1; Optimization of Affinity Chromatography 
Separation: 

Changes in eluant concentration cause GPs to elute 
from the column. Elution volume, however^ is more 
easily measured and specified. It is to be understood 
that the eluant concentration is the agent causing GP 
release and that an eluant concentration can be 
calculated from an elution volume and the specified 
gradient. 

Using a specified elution regime, we compare the 
elution volumes of GP(IPBD)s with the elution volumes 
of wtGP on affinity columns supporting AfM(IPBD) . 
Comparisons are made at various: a) amounts of IPBD/GP, 
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b) densities of AfM(IPBD)/ (volim^ of aa.atrix) (DoAMoM) , 

c) initial ionic strengths, d) elntion rates ^ e) 
amounts of GP/ (volume of support), f) pHs/ and g) 
temperatures, because these- are the parameters most 
likely, to affect the sensitivity and efficiency' of the 
separation. We then pick those conditions giving the 
best separation. 

We do not optimize pH or temperature; rathiiar we 
record optimal values for the other parameters for one 
or more values 'of- pH and temperature. The conditions 
of intended use, specified by the user (Sec. 11), may 
include a specification of pH or temperature. If pH is 
specified, then pH will not be varied in eluting the* 
column (Sec. 15.3) . Decreasing pH may be used to 
liberate bound GPs from the matrix. If the intended 
use specifies a temperature, we will hold the affinity 
column at the specified temperature during elution, but 
we might vary the temperature during recoyeiry. 

The AFM (IPBD) is preferably one known to have 
moderate affinity for the IPBD (K^ in the range 10""^ M 
to 10*"^ M) . When populations of GP(vgPBD)s are 
fractionated, there will be roughly three 
subpopulations: a) those, with no binding, h) those that, 
have some binding but can be washed off with high salt 
or low pH, and c) thofee that bind very tightly and must 
be rescued in situ . We optimize the parameters to' 
separate (a) from (b) rather than (b) from (c). Let' 
PBD^ be a PBD having weak binding to the target and 
PBDg be a PBD having strong binding. Higher DoAMoM 
might, 'for example, favor retention of GP(PBD^) but 
also make it very difficult to elute viable GP(PBDs) . 
We will optimize the affinity separation to retain 
GP(PBD^); rather than to allow release of GP(PBDs) 
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because a tightly boxind GP CPBDg) can be rescued by in 
situ growth. If we find that DoAMoM , strongly affects 
the elution volmtie, then in part III we may reduce the 
amount of target on the affinity column when an SBD has 
5 been found with moderately strong affinity (K^ on the 
order of lO"'^ m) for the target. 

In this step;. we measure elution volumes of 
genetically pure GPs that elute from the affinity 
10 matrix as sharp bands that can be detected by UV 
absorption. Samples from effluent fractions, are plated 
on suitable mediim (cells or spores) or on sensitive 
cells (phage) and colonies or plaques counted. 

15 Several values of IPBD/GP, DoAMoM;. elution rates ^ 

initial ionic strengths, and loadings should be 
examined. We anticipate that optimal values of IPBD/GP 
and DoAMoM will be correlated and therefore should be 
optimized together. The effects of initial ionic 

2 0 strength, elution rate^ and amount of GP/ (matrix 

volume) are unlikely to be strongly correlated, and so 
they can be optimized independently. 

For each set of parameters to be tested, the 
25 column is eluted in a specified, manner. For example, 
we may use a regime called Elution Regime 1: a KCl 
gradient runs from .lOmM to maximum allowed for the 
GP(IPBD) viability in 100 fractions of 0.05 Vy (void 
volume), followed by 20 fractions of 0.05 Vy at maximum 

3 0 allowed KCl; pH of the buffer is maintained at the 

specified value with a convenient buffer such as Tris. 
It is ' important that the conditions of this 
optimization be similar to the . conditions that are used 
in Part III for selection for binding to target (Sec. 
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15,3) and recovery of GPs from the chromatographic 
system (Sec« 15.4) . 

When the osp-lpbd gene is regulated by [XINDUCE] ^ 
5 IPBD/GP can be controlled by varying [XIWDUCE] . 
Appropriate values of [XINDUCE] depend on the identity 
of [XINDUCE] and the promoter; if, for example, XINDUCE 
is isopropylthiogalactoside (IPTG) and the promoter is 
lacUVS . then [IPTG] = 0, 0.1 uM, 1.0 uM, 10.0 xM, 100.0 
10 uM, and 1.0 mM are appropriate levels to test. The 
range of variation of [XINDUCE] is extended until an 
optimum is found or an acceptable level of expression 
is obtained. 

15 DoAMoM is varied from ^ the maximum that the matrix 

material can bind to 1% or 0.1% of this level in 
appropriate steps. We anticipate that the efficiency 
of separation will be a smooth function of DoAMoM so 
that it is appropriate to cover a wide range of values 

20 for DoAMoM with a coarse grid and then explore the 
neighborhood of the approximate optimum with a finer 
grid. 

Several values of initial ionic strength are 
25 tested, such as 1.0 mM, 5.0 mM, 10.0 mM and 2 0.0 mM. . 

The elution rate is varied, by' successive factors 
of 1/2, from the maximum attainable rate to 1/16 of 
this value. The fastest elution rate giving the good 
3 0 separation is optimal. 

The goal of the optimization is .to obtain a sharp 
transition between bound and unbound GPs, triggered by 
increasing salt or decreasing pH or a combination of 
35 both. This optimization need be performed only: a) for 
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each temperature to be nsBdr b) for each pH to be used, 
and c) when a new GP(IPBD) is created. 

Regulatable promoters are available for. all 
5 genetic packages except;, possibly, bacterial spores. A 
promoter functional in bacterial spores might be 
prepared by constructing a hybrid of a sporulation 
promoter and a regulatable bacterial promoter (e.g., 
lac ) , or by saturation mutagenesis of a spoinilation 

10 promoter followed by screening for regulatable promoter 
activity (of. 0LIPS6, OLIP87) . When the promoter of 
the osp'"it)bd gene is not regulatable, we optimize 
DoAMoM, the elution rate,, and the aino\mt of GP/volume 
of matrix. If the optimized affinity separation is not 

15 acceptable, we must develop a means to alter the amount 
of IPBD per GP. 

Sec, 10.2: Measuring the sensitivity of affinity 

separation: 
20 * 

We determine the sensitivity of the affinity 
separation (Cgensi) measuring the minimiam quantity 
of GP{IPBD) that can be detected in the presence of a 

25 large excess of wtGP. The user chooses a number of 
separation cycles, denoted N^hromA that will be 
performed before an enrichment is abandoned; 
preferably, Nqj^j-q^jj^ is in the range 6 to 10 and N^hrom 
must be greater than 4. Enrichment can be terminated 

30 by isolation of a desired GP(SBD) before N^^hrom Passes. 

The measurement of sensitivity is significantly 
expedited if GP(IPBD) and wtGP carry different 
selectable markers. 

35 
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Mixtures of GP(IPBD) and wtGP are prepared in ..the 
ratios of IrViij^i, where Vnn^ ranges by an appropriate 
factor (e^_gN 1/10) over an appropriatis range, typically 
10^^ through 10^. Large values of V^^^j^ are tested 
5 first; once a positive result is obtained for one value 
of Viij^^, no smaller values of V±±m need be tested. 
Each mixture is applied to a column supporting, at the 
optimal DoAMoM, an AfM(IPBD) having high affinity for 
IPBD and the column is eluted by the specified elution 

10 regime. The last fraction that contains viable GPs and 
an inoculum of the colimn matrix material are cultured. 
If GP(IPBD) and wtGP -have different selectable markers, 
then transfer onto selection plates identifies each 
colony. otherwise, a number f e.a. 32) of GP clonal 

15 isolates are tested for presence of IPBD by the 
techniques discussed in Sec^ 8. 

If IPBD is not detected on the surface of any of 
the isolated GPs, then GPs. are pooled from: a) the last 

20 few ( e.g. 3 to 5) fractions that contain viable GPs, 
and b) an inoculxim taken from the column matrix. The 
pooled GPs are cultured and passed over the same column 
and enriched for GP(IPBD) in the manner described. 
This process is repeated until N^ij-om passes have been 

25 performed, or until the IPBD has been detected on the 
GPs, If .GP(IPBD) is not detected after N^jhrom passes, 
"^lim decreased and the process is repeated. 

^sensi ^S^^ls the highest value of Vnjci which 
3 0 the user can recover GP(IPBD) within N^hrom passes. 
The number of chromatographic cycles (K^yc) that were 
needed to isolate GP(IPBD) gives a rough estimate of 
Cgff ; 'c^ff is approximately the Kcycth root of Vlim: 

35 

Ceff = (approxO exp( loge(Vlim)/^cyc ) 
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For example, if V^iiji were 4.0 x 10^ and ttiree 
separation cycles were needed to isolate GPCIPBD) , then 
C^f f = Capp^trox. ) 736 . 

5 

Sec. 1Q>3: Measurincf the efficiency of separation : 

To determine C^ff more accurately, we determine 
the ratio of GP (IPBD)/wtGP loaded onto an AfM(IPBD) 
10 colxamn that yields approximately equal amounts of 
GP(XPBD) and wtGP after elution. 

Sec. 10,4; Other Se-paration Means 

15 Other separation means are optimized in a manner 

parallel to the used for affinity chromatography- 

FACS (e.g. FACStar from Beckton-Dickinson, 
. Mountain View, CA) is most appropriate for bacterial 

20 cells and spores because the sensitivity of the 
machines requires approximately 1000 molecules of 
fluorescent label bound to each GP to accomplish a 
separation. To optimize FACS separation of GPs, we use 
a derivative of Afm(IPBD) that is labeled with a 

25 fluorescent molecule, denoted Afm (IPBD)*. The 
variables that must be optimized include: a) amount of 
IPBD/GP, b) concentration of Afm(IPBD)*, c) ionic 
strength, d) concentration of 'GPs, ar;id e) parameters 
pertaining to operation of the FACS machine. Because 

3 0 Afm(IPBD)* and GPs interact in solution, the binding 
will be linear in both [Afm.(IPBD) *] and [displayed 
IPBD] . Preferably, these two parameters are varied 
together. The other parameters can be optimized 
independently. 

35 
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Electroplioresis is most appropriate to 
bacteriophage because of their small size (SERW87) . 
Electrophoresis is a preferred separation means if the 
target is so small that chemically . attaching it to a 
5 coltonn or to a fluorescent label would essentially 
change the entire target. For example, chloroacetate 
ions contain only seven atoms and would be essentially 
altered by any linkage. GPs that bind chloroacetate 
would become more negatively charged than GPs that do 
10 not bind the ion and so these classes of GPs could be 
separated, - 

The parameters to optimize for electrophoresis 
include: a) IPBD/GP, b) concentration of gel material,* 

15 e, g, agarose, c) concentration of Afm (IPBD) , d) ionic 
strength, e) size, shape, and cooling capacity of the 
electrophoresis apparatus, f) voltages and currents, 
and f) concentration of GPs, Preferably, IPBD/GP and 
[Afm(IPBD)3 are varied at the same time and other 

20 parameters are optimized independently. 

Part III 

Sec. 11 >0: Choice of target material : 



25 



30 



Any material may be chosen as target material , 
subject only to the following restrictions: 

If affinity chromatography is to be used, then: 

1) the molecules of the target material must be of 
sufficient size and chemical reactivity to be 
applied to a solid support suitable for affinity 
separation,. 



35 
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2) after application to a matrix,, the target * 
material must not react with water, 

3) after application to a matrix, the target 

5 material must not bind or degrade proteins in a 

non-specific way, and 

4) the molecules of the target material must be 
sufficiently large that attaching the material to 

10 a matrix allows enough unaltered surface area 

(generally at least 500 S,^ , excluding the atom 
that is connected to the linker) for protein 
binding. 

15 If FACS is to be used as the affinity separation 

means, then: 

1) the molecules of the target material must be of 
sufficient size and chemical reactivity to be 

20 conjugated to a suitable fluorescent dye or the 

target must itself be fluorescent, 

2) after any necessary fluorescent labeling, the 
target must not react with water, 

25 ^ 

3) after any necessary fluorescent labeling, the 
target material must not bind or degrade proteins 
in a non-specific way, and 

3 0 4) the molecules of the target material must be 

sufficiently large that ' attaching the material to 
a suitable dye allows enough unaltered surface 
area (generally at least 500 A^, excluding^ the 
atom that is connected to the linker)' for protein 

35 binding. 
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If affinity electrophoresis is to be used, then: 

1) the target must either be charged or of such a 
nature that its binding to a protein will change 
the charge of the protein, 

2) the target material must not react with water, 

3) the target material must not bind or degrade 
proteins in a non-specific way, and 

4) the target must be compatible with a suitable 
gel material. 

Possible target materials include, but are not 
limited to: a) soluble proteins (such as horse heart 
myoglobin, human neutrophil elastase, activated (blood 
clotting) factor X, alpha-f etoprotein , alpha 
interferon, melittin, Bordetella pertussis adenylate' 
cyclase toxin, any retroviral pol protease or any 
retroviral gag protease), b) lipoproteins (such as 
human low density lipoprotein), c) glycoproteins (such 
as a monoclonal -antibody) , d) lipopolysaccharides (such 
as O-antigen of Salmonella enter itidis ) , e) nucleic 
acids (such as tRMAs, ribosomal RNAs, messenger RNAs 
dsDNA or ssDNA, possibly with sequence specificity) ; f ) 
soluble organic molecules (such as cholesterol , 
aspartame, bilirubin, morphine, codeine, 
dichlorodiphenyltrichlorethane (DDT), benzo (a) pyrene, 
prostaglandin PGE2, protoporphyrin IX, or actinomycin 
D) , g) organometallic complexes (such as iron haem or 
cobolt haem) , h) organic polymers (such as cellulose or 
chitin) , i) insoluble minerals (such as asbestos, 
zeolites, or hydroxylapatite) , j) viral and phage coat 
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proteins (such as influenza haeiaaggutinin or phage 
lambda capsid) , and k) bacterial membrane or outer^ 
membrane proteins (such as LamB from E> coll or 
flagella proteins) . 

A supply of several milligraTus of pure target 
material is desired. Impure target material could be 
used, but one might obtain a protein that binds to a 
contaminant instead of to the target. 

The following information about the target 
material is highly desirable: 

1) stability as a function of temperature, pH, and 
15 .ionic strength, 

2) stability with respect to chao tropes such as 
urea or guanidinimn Cl , 

20 3) pi, 

4) molecular weight, 

5) requirements for prosthetic groups or ions, 
25 such as haem. or Ca*^^, and 

6) proteolytic activity, if any. 

In addition to this most desirable information, it 
3 0 is useful to know: 1) the target's sequence, if the 
target is a macromolecule, 2) the 3D structure of the 
target, 3) enzymatic activity, if any, and 4) toxicity, 
if any. 
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The user of the present invention specifies 
certain parameters of the intended use of. the binding 
protein: 

5 1) the acceptable temperature range, 

2) the acceptable pH range, 

3) the acceptable concentrations of ions and 
10 . neutral solutes, 

4) the jnaximum acceptable .dissociation constant 
for the target and the SBD: 



15 



20 



25 



Krp = [Target] [SBD]/ [Target: SBD] 

In some cases, the user may require discrimination 
between T, the target, and N, some non-target. Let 



% ^ [T] [SBD]/[T:SBD] , and 
% - [N][SBD]/[N:SBD] , 

then Kt/% = ( [TJ [N: SBD] )/ ( [N] [T:SBD] ) . 

The user then specifies a maximum acceptable value for 
the ratio Krp/Kj^f. 

If the target material is a general protease, one 
30 iriust consider the following points: 

1) a highly specific protease can be treated liKe 
any other target, 



35 



2) a general protease^ such as subtilisin, may 
degrade the OSPs of the GP including OSP-PBDs; 



so 

there are several alternative ways of dealing witli 
general proteases, including: a) a chemical 
inhibitor may be used to prevent proteolysis ( e.cr, 
phenylmethylfluoro sulfate (PMFS)^ that inhibits 
serine proteases) , b) one or more active-site 
residues may be mutated to create an inactive 
protein f e> cr. a serine protease in which the 
active serine is mutated to alanine) , or c) one or 
more active-site amino-acids of the protein may be 
chemically modified to destroy the catalytic 
activity ( e.cr^ a serine protease in which the 
active serine is converted to anhydroserine) ^ 

3) SBDs selected f or binding to a protease need 
not be inhibitors; SBDs that happen to inhibit 
the protease target are a fairly small subset of 
SBDs that bind to the protease target, 

4) the more we modify the target protease^ the 
less like we are to obtain an SBD th'at inhibits 
the target protease, and 

5) if the user requires that the SBD inhibit the 
target protease, then the active site of the 
target protease must not be modified any more than 
necessary; inactivation by mutation or chemical 
modification are preferred methods of inactivation 
and a protein protease inhibitor becomes a prime 
candidate for IPBD. For example, BPTI could be 
mutated, by the methods of the present invention, 
to bind to proteases other than trypsin (TANK77 
and TSCH87) . 
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Sec. 12 >0: Choice of GPflPBD) ; 

The user must pick a GP(IPBD) that is suitable to 
the chosen target according to the criteria of Sec. 2.. 
5 It is anticipated that a, small collection of, a 
GP(IPBD)s can be assembled such that, for. any chosen 
target, at least one member of the collection will be a 
suitable starting point for engineering a protein that 
binds tO: the chosen target by the methods of the , 
10 . present invention. The user should optimise the 
affinity separation for conditions appropriate to the 
intended use by the methods described in Part II. 

Sec. 13.0: Identification of Family of PBDs/- Related 
15 to PPBD, to Be Generated 

Sec. 13.1: Choosing residues on IPBD ^or other PPBD) 
to vary; 

20 We choose residues in the IPBD to vary through 

consideration of several factors, including: a) the 3D 
structure of the IPBD, b) sequences homologous to IPBD, 
and c) modeling of the IPBD and mutants of the IPBD. 
Because the number of residues that could strongly 

25 " influence binding is always greater than the number 
that can be varied simultaneously, the user must pick a 
subset of those residues to vary at one time. • The user 
must also pick trial levels of variegation and 
calculate the abundances of various sequences. The 

30 list of varied residues and the level of variegation at 
each varied residue are adjusted until the compos ite^ 
variegation is commensurate with Cggj^g^ and lAxitv* 



35 



A key concept is that only structured proteins 
exhibit specific binding, i.e. can bind to a particular 
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Chemical entity to the exclusion of most others. Thus 
the residues to ■ be varied are chosen with an eye to 
preserving the underlying IPBD structure. 
Substitutions that prevent the PBD from folding will 
5 cause GPS carrying tho^e genes to bind indiscriminately 
so that they can easily be removed from the population- 
Burial of hydrophobic surfaces so that bulk water 
is excluded is one of the strongest forces driving the 

10 binding of proteins to other molecules • Bulk water can 
be excluded from the region between two molecules only 
if the surfaces are complementary. We must test as 
many surfaces as possible to find one that is 
complementary to the target. The select ion- through- 

15 binding isolates . those proteins that are more nearly 
complementary to some surface on the targets The 
effective diversity of a variegated population is 
measured by the number .of different surfaces, rather 
than the number of protein sequences. Thus we should 

20 maximize the number of surfaces generated in our 
population^ rather than the number of protein 
sequences . 

Zn hypothetical example i, we , consider a 
25 hypothetical PBD, shown in Picmre 3 binding to a 
hypothetical target. - Figure 3 is a 2D schematic of 3D 
objects; by hypothesis, residues 1, 2, 4, 6, 7/ 13, 14, 
15, 20, 21, 22, 27, 29, 31, 33, 34, 36, 37, 38, and 39 
of the IPBD are on the 3D surface of the IPBD, even 
30 though shown well inside the circle. Proteins do not 
have distinct, countable faces. Therefore we define an 
"interaction set" to be a set of residues such that all 
members of the set can simultaneously touch one 
molecule of the target material without any atom of the 
35 target coming closer than van der Waals distance to any 
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main-Chain atom of the IPBD. The concept of a residue 
"touching" * a molecuie of the target is discussed below. 
One hypothetical intetraction set, Set A; in - Ficmre 3 
comprises residues 6, 7^ 20, 21/ 22,. 33, and 34, 
5 represented by squares. Another hypothe^tical 
interaction set, Set B, comprises residues 1, 2, 4, 6, 
31, 37, and 39, represented by circles. . 

If we vary one residue, number .21 for example, 
10 through all twenty amino acids,, we obtain 20 protein 
sequences and 20 different surfaces for interaction . set 
A. Note that residue 6 is in two interaction sets and 
variation of residue 6 through all 2 0 amino acids 
yields 20 versions of interaction set A and 2 0 versions 
15 of interaction set B. 

Now consider varying two residues, each through 
all twenty amino .acids, generating 400 protein 
sequences. If the two residues varied were, for 

20 example, number 1 and number 21, then there would be 
only 40 different surfaces because interaction set A 
does not depend on residue 1 and interaction set B does 
not depend on residue 21.: If the two residues varied, 
however, were number 7 and number 21, then 400 surfaces . 

25 . would be generated i 

If N spatially separated residues are varied at 
one time, 20 x N surfaces are generated. Variation of 
N residues in the same interaction set yields 2 0^ 

30 surfaces. For example, if N = 7, variation of 
separated residues yields 140 surfaces while variation 
of interacting residues yields 2 0*^ = 1.28 x 10^ 
surfaces. Thus, to maximize the number of surfaces 
generated when N residues are varied, all residues 

35 should be in the same interaction set. 
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The amount of surface area buried in s-fcrong 
protein-protein interactions ranges from 10 00 to 
2000 (SCHU79, pl03ff ) . Individual amino acids have 
5 total surface areas that depend loaostly on type of amino 
acid and weakly on conformation. These areas range 
from about 180 for glycine to about 360 jS^ for 

tryptophan. From amino-acid solvent exposures of 
published protein structures, we calculate that lOOOS-^ 

10 on a protein surface comprises between 4 and 30 aiuino- 
acid residues . Varied amino acid sequences , as found 
in actual proteins, involve between 10 and' 25 residues 
in forming 1000 g^ of protein surface . Schul z and. 
Schirmer estimate that 100 of protein surface can 

15 exhibit as many as 1000 different specific patterns 
(SCHtJTS, pl05) V The number of surface patterns rises 
exp onent ia lly with the area that can be varied 
independently. One of the BPTI structures recorded in 
the BrooKhaven Protein Data Bank (6PTI) , for example, 

20 has a total exposed surface area of 3997 (using the 
method of Lee and Richards (LEEB71) and a solvent 
radius of 1.4 R and atomic radii as shown in Table 7) . 
If 'we could vary this surface freely and if 100 can 
produce 100 0 patterns , we could construct 10^^ ^ 

25 different patterns by varying the surface of BPTI ! 
This calculation is intended only, to suggest the huge 
number of possible surface patterns abased on a common 
protein backbone. 

'30 One protein framework cannot, however, display all 

possible patterns over any one particular 100 kP- of 
surface merely by replacement of the side groups of 
surface residues. The protein backbone holds ' the 
varied side groups in approximately constant locations 

35 so that the variations are not independent. We can. 
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nevertheless, generate a vast, collection of different 
protein surfaces by varying those protein residues that 
face the outside of the. protein. 

5 Examination of a model of BPTI in contact with 

myoglobin shows that residues 3, 1, 8, 10, 13, 39, 41, 
and 42 can, all simultaneously contact a molecule the 
size and shape of myoglobin. Residue 49 cannot touch a 
single myoglobin molecule simultaneously with any of 
10 the first set even though all are. on the surface of 
BPTI, , It is not the intent of the present invention, 
however, to use models to determine . which part of the 
target molecule will actually be the site of binding by 
a PBD. 

15 . . . ' 

For cassette mutagenesis, the protein residues to 
be varied are, preferably, close enough in sequence 
that the variegated DNA (vgDNA) encoding all of them 
can be made in one piece. The present invention is not 

2 0 limited to a particular length of vgDNA that can be 

synthesized • With current technology, a stretch of 60 
amino acids (180 DNA bases) can be spanned • 

One can use other mutational means, such as 
25 single-stranded-oligonucleotide-directed mutagenesis 
(BOTS85) using two or more mutating primers to mutate 
widely separated residues - 

Alternatively, to vary "residues separated by more 
30 than sixty residues, two cassettes may be mutated. .A 
first cassette is mutaigenised to produce a population 
having, for example, up to 30,000 members. Using 
variegated OCV, we mutagenize a second cassette to 
produce a second variegated population having the 

3 5 desired diversity.. 
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Tiie compos ite level of variation must: not exceed 
the prevailing capabilities to a) produce very large 
numbers of independently transformed cells or b) detect 
small components in a highly varied population . The 
limits on the level of variegation are discussed in 
Sec. 13 •2* 

We assemble the data about the XPBD and the target 
that are useful^ in deciding which residues to vary 1) 
3D structure^ or at least a list of residues on the 
surface of the IPBD, 2) list of sequences homologous to 
XPBD, and 3) model of the target molecule or a stand-in . 
for the target. 

These data and an understanding of the behavior of 
different amino acids in proteins will be used to 
answer two questions: 

1) which residues of the IPBD are on the outside 
and close enough together in space to touch the 
target simultaneously? 

2) which residues of the IPBD can be varied with 
high probability of retaining the underlying IPBD 
structure? 

Although an atomic model of the target material 
from X-ray crystallography, NMR, etc. is preferred in 
such exiamination, it is not necessary. For exampile, if 
the target were a protein of unknown 3D structure, it 
would be sufficient to know the molecular weight of the 
protein and whether it were a soltible globular protein, 
a fibrous protein, or a membrane protein- One can then 
choose a protein of known structure of the same class 



wo 90/02809 



PCr/US89/03731 



87 - . / 

and similar size and shape to use as a molecular stand- 
in and' yardstick* At low resolution, all proteins of a 
given size and class look much the same. The specific 
voltimes are the same, . all are more or less spherical 
5 and therefore all proteins of the same size and class 
have about the same radius of curvature. The radii of 
. curvature of . the two molecules determine how much of 
the two luolecules can come into contact. 

10 The most appropriate method of picking the 

residues of the protein chain at which the amino acids 
should be varied is by viewing, with interactive 
computer graphics, a model of the IPBD. A stick-figure 
representation of molecules is preferred. A suitable 

15 set of hardware is an Evans & Sutherland PS390 graphics 
terminal (Evans & Sutherland Corporation, Salt Lake 
City, UT) and a MicroVAX II supermicro computer 
(Digital Equipment Corp*, M^ynard, MA) . Suitable 
programs for viewing and manipulating protein models 

2 0 include: a) PS-FI^ODO, written by T. A. Jones (JONESS) 

and distributed by the Biochemistry Department of Rice 
University, Houston, TX; and b) PROTEUS, developed by 
Dayringer, Tramantano, and Fletterick (DAYR8 6) . 

25, Theoretical calculations, such as dynamic 

simulations of proteins, are used to estimate the 
effect of substitution at a particular residue of a 
particular amino-acid type on the 3D structure of the 
parent protein. Such calculations might ,also indicate 

3 0 whether a particular substitution will greatly affect 

the flexibility of the protein. 

Sec, 13.1.1: The Tprinclnal set; 
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Using tiie knowledge of wtLicii residues are on tlie 
surface of the IPBD^ we pick residues that are close 
enough together on the surface of the IPBD to touch a 
molecule of the target simultaneously without having 
5 any IPBD main-chain atom come closer than van der Waals 
distance ( visr. 4.0 to 5.0 A) from any target atom, A 
residue of the IPBD »' touches" the target if: a) a main- 
chain atom is within van der Waals distance, viz. 4.0 
to 5.0 S of any atom of the target molecule, or b) the 

10 ^toeta within D^-j^^Qff of any atom of the target 

molecule so that a side-group atom could make contact 
with that atom. Because side groups differ in size 
( cf . Table 35) , some judgment is required in picking, 
^cutoff* "^^^ preferred embodiment^ we will use 

15 ^cutoff ~ i^'^t other values in the range 6.0 S to 

10.0 R could be used. If IPBD has G at a residue, we 
construct a pseudo Cj^^^g. with the correct bond distance 
and angles and judge the ability of the residue to 
touch the target from this pseudo Cj^^t^. 

20 

Alternatively, we choose a set of residues on the 
surface of the IPBD such that the curvature of the 
surface defined by the residues in the set is not so 
great that it would prevent contact between all 
25 residues in the set and a molecule of the target. This 
method is appropriate if the target is a macromolecule, 
such as a protein, because the PBDs derived from the 
IPBD will contact only a part of the macromolecular 
surface. 

30 ■ 

We prefer that there be some indication that the 
underlying IPBD structure will tolerate stibstitutions 
at each residue in the principal set of residues . 
Indications could come from various sources, including: 



s 
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a) hoiuologous sequences, b) static computer modeling, 
or c) dyriamic computer simulations. ■ 

The residues in the principal set need not " be 
5 contiguous in the protein segaience. We require only 
that the amino acids in the residuies to be varied all 
be capable of touching a molecule of the target 
material simultaneously without having atoms overlap. 
If the target were, for example, horse heart myoglobin, 
10 and if the IPBD were BPTI, any set of residues in one 
interaction set of BPTI defined in Table 34 could be 
picked. 

Preferably, the principal set contains eight to 
15 sixteen residues. This number of residues allows 
sufficient variability that a surface that, is 
complementary to the target can be found, but is small 
enough that a significant fraction of the surface can 
be varied at one time. 

20 

Sec. 13.1.2; The secondary set: 

The secondary set comprises residues that touch 
residues in the primary set, and are excluded from the 
25 primary set because the residue: a) is internal/ b) is 
highly conserved, or c) is on the surface, but the . 
curvature of the IPBD surface prevents the residue from 
being in contact with the target at the same time as 
one or more residues in the primary set. 

30 

Internal residues, although frequently conserved 
and may tolerate some conservative changes such as I to 
L or F to Y. These changes affect the detail placement 
and dynamics of adjacent protein residues and such 
35 variation may be useful once an SBD is found. 
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Surface residues in the secondary set are most 
often located on tlie periphery of the principal set, 
which do not make direct contact with the target 
5 simultaneously with all -other residues of the principal 
set. The charge on the amino acid in one of these 
residues could, however, have a strong effect on ^ 
binding- It is appropriate to vary the charge of some, 
or all of these residues to improve an SBD, ' For 
10 example, the variegated codon containing eguimolar A 
and G at base 1, eguimolar C and A at base 2, and A -at 
base 3 yields amino acids T, A, and E with equal 

probability. 

15 Sec> 13.1.3; Choice of residues to vary initially: 

The allowed level of variegation that assures 
progressively determines/ how many residues can be 
varied at once; geometry determines which ones. 

20 

The user picks residues to vary in many ways; the 
following is a preferred manner* Pairs of residues are 
picked that are diametrically opposed across the face 
of the principal set. Two such pairs are used to 

25 delimit the surface, up/down and right/left. 
Alternatively, three residues that form an inscribed 
triangle, having as large an area as possible, on the 
surface are picked. one to three other residues are 
picked iii a checkerboard fashion across the interaction 

30 surface. Choice of widely spaced residues to vary 

creates the possibility for high specificity because ^ 
all the intervening residues must have acceptable 
complementarity before favorable interactions can occur 
at widely-separated residues. 

35 
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The niimber of residues picked is coupled to the 
range through which each can be varied by the 
restrictions discussed in Sec. 13,2. In the first 
round, we do. not assume any binding between IPBD and 
5 the target and so progressivity is not an issue. At 
the first round, the user may elect to produce a level 
of variegation such that each itiolecule of vgDNA is 
potentially different through, for example, unlimited 
variegation of 10 codons (20-^^ approx. = 10^^). One 
10 run of the DNA synthesizer produces approximately lo^^ 
molecules of ' length 100 nts. Inefficiencies in 
ligation and transformation will reduce the number of 
proteins actually tested to between lo'^ and 5 x .10^. 
Multiple iterations of the process with such very high 
15 levels of variegation .will not yield repeatable 
results; the user must decide whether this is 
important. 

Sec. 13.2; Range of variation at Each' Site of 

2 0 Mutation; 

The total level of variegation is the product of 
the number of variants at each varied residue. Each 
varied residue can have a different scheme of 
25 variegation, producing 2 to 20 different possibilities. . 
We require that the process be progressive, i.e. each 
variegation cycle produces a better starting point for 
the next variegation cycle than the previous cycle 
produced. 

30 

N. B. ; Setting the level of Variegation such 
that the pT3bd and many segiiences related to 
the ppbd sequence are present in detectable 
amounts insures that the process is 
35 progressive. If the level of variegation is 
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SO higli that ttie ppbd sequence is present at 
such' lov levels that there is an appreciable 
chance that no transformant will display the 
PPBD, then the best SBD of the next round 
5 could be worse than the PPBD* At excessively 

high level of variegation^ each round of 
mutagenesis is independent of previous rounds 
and there is no assurance of progressivity • . 
This approach can lead to valuable binding 
10 proteins but repetition of experiments with. 

this level of variegation will not yield 
progressive results. Excessive variation is 
not preferred. 



15 If the level of variegation is such that the 

parental sequence and each single amino-acid change is 
present for selection, then we know that a selected 
sequence is closer to optimal or the same, as the 
parent. If, on the other hand, very high levels of 

2 0 variegation are used, a sequence may be selected, not 

because it is superior to the parental sequence, but 
because the parental and improved sequences are, by 
chance , abs ent . 

25 Progressivity is not an all-or-nothing property, 

So long as most of the information obtained from 
previous variegation cycles is retained and many 
different surfaces that are related to the PPBD surface 
are produced, the process is progressive. If the level 

3 0 of variegation is so high that the ppbd gene may not be 

detected, the assiirance of progressivity diminishes. 
If the probability of recovering PPBD is negligible, 
then the probability of progressive behavior is also 
negligible-, 

35 



wo 90/02809 



PCr/US89/03731 



93 ' ' 

An opposing force in our design consideration^ is 
that PBDs are useful in the population only up to the 
amount that can be detected; any excess above the 
detectable amount is wasted. Thus we produce as many 
5 surfaces related to PPBD as possible within the 
constraint that the PPBD be detectable. 

We defer specification of exactly how much 
variegation is allowed until we have: a) specified real 
10 nt distributions for a variegated codon^^ and b) 
examined the effects of discrepancies between specified 
nt distributions and actual nt distributions . 

Sec. 13.3; Desicrn of vaDNA Encodincr PBD Family; 

15 

We must now decide how to distribute the 
variegation^ within the codons for the residues to be 
varied. These decisions are influenced by the nature 
of the genetic code. When vgDNA is synthesized, 

20 variation at the first base of a codon creates a 
population containing amino acids from the same column 
of the genetic code table (as shown in the Table 3-6 on 
pS7 of WATS87) ; variation at the second base of the 
codon creates a population containing amino acids from 

25 the same row of the genetic code table; variation at 
the third base of the codon creates a population 
containing amino acids from the same box. ' If two or 
. three bases in the same codon are varied, the pattern 
is more complicated. Work with 3D protein structural 

3 0 models may suggest definite sets of amino acids to 
substitute at a given residue, but the method of 
variation may require either more or fewer kinds of 
amino acids be included. For example, examination of a 
model might suggest substitution of N or Q at a given 

35 residue. Combinatorial variation of codons requires 
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tlaat mixing K and Q at one location also include K and 
H as possibilities at the same residue. One must 

choose to put: 1) N" only, 2) Q only, or 3) a mixture of 
N, H, and Q. The present invention does not rely on 
accurate predictions of which amino acids should be 
placed at each residue^ rather attention is focused on 
which residues should be varied. 

There are many ways to generate diversity in a 
protein. (See RICH86,r CARU85, and 0LIP86.) One extreme 
case is .that one or a few residues of the protein are 
varied as much as possible (inter alia see CARU85, 
CARUS7^ RXCH86^ and WHAR86) . We will call this limit 
"Focused Mutagenesis" . Focused Mutagenesis is 
appropriate when the IPBD or other PPBD shows little or 
no binding to the target, as at the beginning of the 
search for a protein to bind to a new target material. 
When there is no binding between the PPBD and the 
target, we preferably pick a set of five to seven 
residues and vary eaLch through all 2 0 possibilities. 

An alternative plan of mutagenesis ("Diffuse 
Mutagenesis") is to vary many more residues through a 
more limited set of choices (See Vershon et al^, Chl5 
of IN0U86 and PAKUB6) . This can be accomplished by 
spiking each of the pure nts activated for DNA 
synthesis ( e^cr. nt-phosphoramidites) with a small 
amount of one or more of the other activated nts. 
Contrary to general practice, the present invention 
sets the level of spiking so that only a small 
percentage ( 1% to .00001%, for example ) of the final 
product contains the initial DNA sequence. Many 
single, double, triple, and higher mutations occur, but 
recovery of the basic sequence is a possible outcome. 
Let be the number of bases to be varied, and let Q 



be the fraction of all sequences that should have the 
parental ' is eguence, then M, the fraction of the mixture; 
that is the majority component, is 

M = exp{ loge(Q)/Nb } = 10 (1^910 (Q) /^b) . 

If, for example, thirty base pairs on the DNA 
chain were to be varied and 1% of the product is to 
have the parental sequence, then each mixed nt 
substrate should contain 86% of the parental nt and 14% 
of other nts. Table 8 shows the fraction (fn) of DNA 
molecules having n hon-parental bases when 3 0 bases are 
synthesized with reagents that contain fraction M of 
the majority component- When M=. 63096, f24 and higher 
are lesp than lO"^, The entry "most" in Table 8 is the 
number of changes that has the highest probability. 
Note that substantial probability for multiple 
substitutions only occurs if the fraction of parental 
sequence . (fO) is allowed to drop to around 10"^. 
Mutagenesis of this sort can be applied to any part of 
the protein at any time, but is most appropriate when 
some binding to the target has been established. The 
Nj3 base pairs of the DNA chain that are synthesized 
with mixed reagents need not be contiguous. They are 
picked so that between Nj^/ 3 and Nj^ codons are affected 
to various degrees. The residues picked for mutation 
are picked with reference to the 3D structure of the 
IPBD, if known. For example, one might pick all or 
most of the . residues in the principal and secondary 
set. We may impose restrictions on the extent of 
variation at each,, of these residues based on homologous 
sequences or other data. The mixture of non-parental 
nts need not be random, rather mixtures can be biased 
to give particular amino acid types specific 
probabilities of appearance at each codon. For 
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example^ one residue may contain a hydrophobic amino 
acid in all 3cnown homologous sequences; in such a' case, 
the first and third base of that codon would be varied, 
but the second would be set to T. This diffuse 
5 structure-directed mutagenesis will reveal the subtle 
changes possible in protein backbone associated with 
conservative interior changes, such as V to I, as well 
as some not so subtle changes that require concomitant 
changes at two or more residues of the protein - 

10 

For Focused Mutagenesis, we now consider the 
distribution of nts that will be inserted at each 
variegated codon . Each codon could be programmed, 
differently. If we have no information indicating that 

15 a particular amino acid or class of amino acid is 
appropriate, we strive to substitute all amino acids ' 
with equal probability because representation of one 
pbd above the detectable level is wasteful. Equal 
amounts of all four nts at each position in a codon 

20 yields the amino acid distribution in which each amino 
acid is present in proportion to the number of codons 
that code for it. This distribution has the 
disadvantage of giving two basic residues for every 
acidic residue. In addition, six times as much R, S, 

25 and L as W or M occur. If five codons are synthesized 
with this distribution, sequences encoding five Rs are 
777 6-times more abundant than sequences encoding five , 
Ws. To have W-W-W-W-W present at detectable levels, we 
must have R-R-R-R-R present in 7776-fold excess. 

30 

Let 2UDun(x) be the abundance of DNA sequences ? 
coding for amino acid defined by the distribution of 
nts at each base of the codon. For any distribution, ^ 
there will be a most-favored amino acid (mfaa) with 
35 . abundance Abun(mfaa) and a least-favored amino acid 
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(Ifaa) with abundance Abun(lfaa) . We seek the nt 
, distribution that allows all twenty amino acids and 
that yields the largest ratio Abun (If aa)/Abun (mf aa) 
subject to two constraints: equal abundances of acidic 
5 and basic amino acids and the least, possible number of 
stop codons. Thus only nt distributions that yield 
Abun(E)-fAbunCD) = Abun(R)+Abun(K) are considered, and 
the function maximized is: 

10 { (l-Aburi(stop) ) CAbun(lfaa)/Abun(mfaa) ) } ♦ 

We have simplified the search for an optimal nt 
distribution by limiting the third base to T or G (C or 
G is equivalent) • All amino acids are possible and the 
15 number of accessible stop codons is reduced because TGA 
and TAA codons are eliminated.- The amino acids F, Y, 
C, N, I, and D require T at the third base while W, 
M, and E require G. Thus we use an eguimolar 

mixture of T and G at the third base. 

20 

A computer program, written as part of the present 
invention and named "Find Optimum vgCodon** (See Table 
9), varies the composition at bases 1 and 2 , in steps 
of 0.05, and reports the composition that gives - the 
25 largest value of the quantity { (Abun(lfaa)/Abun(mfaa) 
(l-Abun(stop) ) ) } . A vg codon is symbolically defined 
by the nt distribution at each base: 









T 




C 


A 


G 


30 
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#1 = 


tl 




cl 


al 


gl 




base 


#2 = 


t2 




C2 
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+ g2 = 1.0 
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t3 = g3 = 0.5^ c3 = a3 = 0. 

The variation of the quantities tl^ cl, al, gl^ t2, c2 , 
5 a2 f and g2 is sub j ect to the constraint that 
Abun(E)+AbunCD) equals Abun(K) +Abun(R) ; 

Abun(E)+Abun(D) = gl*a2 

AbunCK)4-Abun(R) = al*a2/2 + cl*g2 + al*g2/2 

10 \ ^ 

gl*a2 = al*a2/2 + cl*g2 -i- al*g2/2 

Solving for g2 , we obtain 

15 g2 = (gl*a2 - 0.5*al*a2)/ (cl + 0.5*al) 

In addition/ 

tl = 1 - al - cl - gl 
2 0 t2 = 1 - a2 - c2 - g2 . 

We vary al^ cl, gl, a2, and c2 and then calculate tl, 
g2, and t2. Initially, variation is in steps of 5%. 
Once an approximately optimuta distribution of nts is 
25 determined, the region is further explored with steps 
of 1%. The logic of this program is shown in Table 9. 
The optimum distribution is: 

Optimtim ygCodon 

30 



base #1 
base #2 
base #3 



T C A G 

= 0.26 0.18 0.26 0,30 

= 0.22 0.16 0.40 0.22 

=? 0.5 0,0 0.0 0.5 



35 
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and yields DITA molecules encoding each type amino acid 
with the ' abundances shown in Table 10. 

The computer that controls a DNA synthesizer, such 
5 as the Milligen 7500, can .be programmed to synthesize- 
any base of an oligo-nt with any distribution of nts by 
taking some nt substrates ( e>a. nt phosphoramidites) 
from each of two or more reservoirs. Alternatively, nt 
substrates can be mixed in any ratios and placed in one 
10 of the extra reservoir f or . so called "dirty bottle" 
synthesis w 

The actual nt distribution obtained will differ 
from the specified nt distribution due to several 
.15 causes, including: a) differential^ inherent reactivity 
of nt substrates, and b) differential deterioration of 
reagents. It is possible to compensate partially for 
these effects, but some residual error will occur. We 
denote the average discrepancy between specified and 
20 observed nt fraction as S^^-j., 

Se^j- - square root ( average[ (f^bs " •fspec)/fspec J ) 

were - fobs "the amount of one type of nt found at a 
25 base and fspec is the amount of that type of nt that 
was specified at the same base. The average is over 
, . all specified types of nts and over a number ( e.g. 10 
or 20) different variegated bases. By hypothesis, the 
actual nt distribution at a variegated base will be 
3 0 within 5% of the specified distribution. Actual DNA 
synthesizers and DNA synthetic chemistry may have 
different error • levels. it is' the user's 

responsibility to determine S^^.-^ for the DNA 
synthesizer and chemistry employed* 
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To cietermine the possible, effects of errors in nt 
composition on the amino-acid distribution, we modified 
the program "Find Optimiim vgcodon" in four ways: 

1) the fraction of each nt in the first two bases 
is allowed to vary' from its optimim value times (1 
~ ^err) to the optimum value times (1 + S^j-^-) in 
seven- equal steps (S^rr is the hypothetical 
fractional error level entered by the user) ; the 
sum of nt fractions at one base always equals i.o. 



2) g2 is varied in the same manner as a2, i, 



we 



dropped the restriction that Abun(D) +Abxin (E) - 
Abun (K) +Abun (R) , 

3) t3 and g3 are varied from 0.5 times (± - S^rr) 
to 0.5 times (1 + S^rr) in three equal steps, 

^) the smallest ratio Abun(lfaa)/Abun(mfaa) is 
20 sought. 

In actual experiments, we will direct the synthesizer 
to produce the optimum DNA distribution "Optimxim 
vgCodon" given above. Incomplete control over DNA 

25 chemistry may, however, cause us to actually obtain the 
following distribution that is the worst that can be 
obtained if all nt fractions are within 5% of the 
amounts specified in "Optimum vgCodon". a 
. corresponding table can be calculated for any given . 

30 Sgrr using the program "Find worst vgCodon within Serr 
Of given distribution." given in Table 11. 
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base #1 = 0.251 0.189 0.273 0.287 

base #2 ^ 0.209 0.160 0-400 0,231 

base #3 = 0.475 0.0 0.0 0.525 

5 This distribution yields DNA encoding different 

amino acids at the abundances shown in Table 12. 

If five codons are synthesized with reagents mixed 
so as to produce the nt-distribut ion •'Optimum vgCodonM , 

10 and if we actually, obtained the nt-distribution 
"Optimum vgCodon, worst 5% errors", then DNA sequences 
encoding the mf aa at all of the five codons are about 
277 times as likely as .DNA sequences encoding the Ifaa 
at all of the . five codons; about 24% of the DNA 

15 sequences will have a stop codon in one or more of the 
five codons ♦ 

When five codons are synthesized using equimolar 
mixtures at bases 1 and 2, (Abun (mfaa)/Abun(lfaa.) ) ^ = 

20 7776. If we program the optimum nt distribution and 
come within 5%, then (Abun(mf aa)/Abun(lf aa) ) ^ - 277. . 
The total number of different PBDs is unchanged, but 
the least-favored sequence is about 28 times more 
abundant. Detecting the least-favored amino-acid 

25 sequence when varying four residues with equimolar nts 
at each varied base requires as sensitive a separation 
system as does , detecting the least-favored amino-acid 
sequence when varying five residues with the optimized 
nt distribution. 



30 



By hypothesis, the distribution "Optimal vgCodon" 
is used in the second version of the second variegation 
of hypothetical example 2. The abundance of the DNA 
encoding each, type of amino acid is, however, taken 
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from the Table 12. The abundance of DNA encoding the 
parental amino acid sequence is: 

Amount (parental seq. } 

F24 G3 0 ^ D34 E42 T47 

= Abun(F) * Abun(G) * Abun(D) * Abun(E) * Abun(T) 
=^ .0249 X .0663 X -0545 X .0602 X .0437 
= 2,4 X 10"7 

Therefore^ DNA .encoding the PPBD sequence as well as 
very many related sequences will be present -in 
sufficient quantity to be detected and we are assured 
that the process will be progressive. 

A level of variegation that allows recovery of the 
PPBD has two properties: 

1) we cannot regress because the PPBD is 
-available, 

2) an enormous number of multiple changes related 
to the PPBD are available for selection and we are 
able to detect and benefit from these changes. 

The user must adjust the list of ■ residues to be 
varied and levels of variegation at each residue until 
the calculated variegation is within the bounds set by 
Mjitv and Cgej^Lsi- 

Preferably, we also consider the interactions 
between the sites of variegation and the surrounding 
DNA. If the method of mutagenesis to be used is 
replacement of a cassette, we consider whether the 
variegation will generate gratuitous restriction sites 
and whether they seriously interfere with the intended 
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introduction of diversity. We a^educe or eliminate 
gratuitous restriction sites by appropriate choice of 
variegation pattern and silent alteration of codons 
neighboring the sites of variegation. See the Detailed 
5 Example. 

Sec, 14,1: Insertion of synthetic vgD NA into a 
Plasmids; 

10 For cassette mutagenesis, restriction sites were 

designed and synthesized, and are used to introduce the 
synthetic vgDNA into the OCV.. Restriction digestions 
and ligations are performed by standard methods 
(AUSU87) . ' In the case of single-stranded- 

15 oligonuGleotide-directed mutagenesis , synthetic vgDNA 
is used to create diversity in the vector (BOTS85) . 

Sec. 14.2; Transformation of cells: 

20 The present invention is not limited to any one 

method of transforming cells with DNA. Standard 
methods, such as thos described in MANI82, may be 
optimized for the particular host cells and OCV, The 
goal, is to produce a large ' number of independent 

25 transformants, preferably lo"^ of more. It is not 
necessary to isolate transformed cells between 
transformation and affinity separation . We prefer to 
have transformed cells at high concentration so that 
they can be plated densely on relatively few plates. 



30 



Sec. 14.3: Growth of the GPfvqP BD^ T)ot)uIation: 



The transformed cells are grown . first under non- 
selective conditions that allow expression of plasmid 
35 genes and then selected to kill untransformed cells. 
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Transformed ce^ls are then induced -to express tlie ost?- 
pbd gene at the appropriate level of induction,, as 
determined in Sec, 10.1. The GPs carrying the IPBD are 
harvested by a method appropriate to the package. 

5 

A high level of diversity can be generated by in 
vitro variegated synthesis of DKA and this diversity 
can be maintained passively through several generations 
in an organism without positive selective pressure, 

10' lioss or reduction in frequency of deleterious mutations 
is advantageous for the purposes of the present 
invention. It is preferable that the selection is must- 
be perfo2rmed before more than a few generations elapse. 
Moreover, subdividing the variegated population before 

15 amplification in an organism by removing a small sample 
(less than 10%) for further work would result in loss 
of diversity; therefore, one should use all or most of 
the synthetic DNA and most or all of the transformed 
cells- 

20 

Sec, 15. r Isolation of GPrPBDVs with bindincr-to- 
target phenotvoes : 

The harvested packages are enriched for the 
25 binding-to-targeit phenotype by use of affinity- 
separation involving target material immobilized on a 
matrix. Packages that fail to bind to target material 
are washed away. If the packages are bacteriophage or 
endospores, it may be desirable to include a 
30 bacteriocidal agent, such as azide, in the buffer to 
prevent bacterial growth. 

Sec. 15.lt Attaching the tarcret material to a column: 
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. Affinity column chromatography is the preferred 
method of affinity separation, but other affinity 
separation methods may be used. A 'variety of 
commercially available support materials for affinity. 
5 chromatography are used. These include . derivatized 
beads to which the target 'material is covalently 
liiiked, or noh-derivatized material to which the target 
material adheres irreversibly. 

10 Suppliers of support material for affinity 

chromatography 'include: Applied Protein Technologies 
Cambridge, MA; Bio-Rad Laboratories, Rockville Center, 
NY; Pierce Chemical Company, Rockford, IL. • Target 
materials are attached to the matrix in accord with the 

15 directions .of the manufacturer of each matrix 
preparation with consideration of good presentation of 
the target. 

Sec. 15,2; Reducing selection due to non-specific 
20 binding: 

We reduce non-specific binding of GPCPBD)s to the 
matrix that bears the target in two ways: 

25 1) we treat the column with blocking agents such 

as genetically defective GPs or a solution of 
protein before the population of GP(vgPBD)s is 
chromatographed, and 

2) we pass 'the population of GP(vgPBD)s over a 
matrix containing no target or a different target 
from the same class as the actual target prior to 
affinity chromatography. 



30 



St 
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Step (1) above saturates any non-specific binding that 
the affinity matrix might show toward wild-type GPs or 
proteins in general; step (2) removes components of our 
population that exhibit non-specific binding to the 
5 matrix or to molecules of the same class as the target. 
If. the target were horse heart myoglobin, for example 
a column supporting bovine serum albumin could be used 
to trap GPs exhibiting PBDs with strong non-specific 
binding to proteins.. If cholesterol were the target, 

10 then a hy drophob ic comp ound , such as p - 
tertiarybutylbenzyl alcohol, could be used to remove 
GPs displaying PBDs having strong non-specific binding 
to hydrophobic compounds. It is anticipated that PBDs 
that fail to fold or that are prematurely terminated 

15 will be non-speciflcally sticky. The capacity of the 
initial coltunn that removes indiscriminately adhesive 
PBDs should be greater ( e.g. 5 fold greater) than the 
column that supports the target molecule. 

2 0 Variation in the support material (polystyrene, 

glass, agarose, etc, ) in analysis of clones carrying 
SBDs is used to eliminate enrichment for packages that 
bind to the support material rather than the target. 

25 Sec. 15.3: Elutincr the column: 

The population of GPs is applied to an affinity 
matrix under conditions compatible with the intended 
use of the binding protein and the population is 
30 fractionated by passage of a gradient of some solute 
over the column. The process enriches for PBDs having 
affinity for the target and for which the affinity for 
the target is least affected by the eluants used. 
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Ions or cof actors needed for stability of PBDs 
(derived from IPBD) or target must be included in 
buffers' at appropriate levels. We first sremove 
GP(PBD)s that do not bind the target by washing the 
5 matrix with the volume of the initial buffer required 
to bring the optical density (at 260 nm or 280 nm) back 
to base line plus one to five void volumes (Vy) ^ The 
column is then eluted with a gradient of increasing: a) 
salt, b) [H4-] (decreasing pH) , c) neutral solutes, d) 
' 10 temperature (increasing or decreasing) , or e) some 
combination of ' these factors. Salt is the most 
preferred solute for gradient formation* Other solutes 
that generally weaken non-^covalent interaction may also 
be used. "Salt" includes solutions containing, any. of 
15 the following ionic- species: 



Na+ 


K+ 


Ca++ 


Mg++ 


NH4+ 


Li+ 


Sr++ 


Ba++ 


Rb+ 


Cs+ 


CI- , 


Br- 


1504 — 


HSO4- 


PO4 — - 


HPO4 — 


H2PO4- 


CO3 — 


HCO3- 


Acetate 


Citrate 


Standard 1- 
Amino Acids 


Standard 
nucleotides 


Guanidinium 
CI 



•30 , 

Other ionic or neutral solutes may be used. All 
solutes are subject to the necessity that they not kill 
the genetic packages. Neutral solutes, such as 

35 ethanol, acetone, ether, or urea, are frequently used 
in protein purification, however, many of these are 
very harmful to bacteria and bacteriophage above low 
concentrations. Bacterial spores, on the other hand,, 
are impervious to jnost neutral solutes.. Several passes 

40 may be made through the steps in Sec. 15. Different 
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solutes may be used in different analyses, salt in one, 
pH in tfcte next, etc. 

Sea, 15,4; Recovery of -packages: 

5 

Recovery of packages that display binding to an 
affinity colmnn may be achieved in several ways, 
including from: 

10 1) fractions eluted with a gradient as described 

above; 

2) fractions eluted with soluble target material, 

3) cells grown in situ on the matrix, 

4) cells incubated with parts of the matrix^r 
15 5) fractions eluted after chemically or 

enzymatically degrading the linkage holding the 
target to the matrix, and 

6) regeneration of GPs after degrading the 
packages and recovering OCV DNA. 

20 

It is possible to utilize combinations of these 
methods. It should be remembered that what we want to 
recover from the affinity matrix is not the GPs per se, 
but the information in them. Recovery of viable GPs is 
25 very strongly preferred, but recovery of genetic 
material is essential. 

Inadvertent inactivation of the GPs is very 
deleterious. It is preferred that maximum limits for 

30 solutes that do not inactivate the GPs or denature the 
target or the column are determined. One may use 
conditions that denature the column to elute GPs ; 
before, the target is denatured, a portion of the 
affinity matirix should be removed for possible use .as 

35 an inoculum. As the GPs are held together by proteih- 
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protein int.eractions. and other non-covalent molecular 
interactions, there will be cases in which the 
molecular package will bind so tightly to the target 
molecules on the affinity matrix that the GPs can not 
be washed off in viable form. This will only occur 
when very tight binding has been obtained* In these 
cases, . methods (3): through (5) above can be used to 
obtain' the bound packages or the genetic messages from 
the affinity matrix. 



It is possible, by manipulation of the elution 
conditions, to isolate SBDs that bind to' the target at 
one pH (pHjj) but not at another pH (pHq) . The 
population is applied at pHj-j and the coluxan is washed 

15 thoroughly at pH^. The column is then eluted with 
buffer at pHq and GPs that come off at the new pH are 
collected and cultured. Similar procedures may be used 
.for other solution parameters , such as temperature. 
For example, GP(vgPBD)s could be applied to a column 

20 supporting insulin. After eluting with salt to remove 
GPs with little or no binding to insulin, we elute with 
salt and glucose to liberate GPs that display PBDs that 
bind insulin or glucose in a competitive manner, 

25 Sec> 15.5: ^Amplifying the Enriched Packages 

Viable GPs ha.ving the selected binding trait are 
amplified by culture in a suitable medium, or, in the 
case of phage, infection into a host so cultivated. If 
3 0 the GPs have been inactivated by the chromatography, ' 
the OCV carrying the osp - pbd gene must be recovered 
from the GP, and introduced into a new, viable host. 

Sec. 15.6: Determining whether further enrichment is 
35 needed: 
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i ' ■ " 

Tlie * probability of isolating a GP vrith improved 
binding increases by C^ff with each separation cycle. 
Let N be the number of distinct amino-acid seguences 
5 produced by the variegation. We want to perform K 
separation cycles before attempting to isolate an SBD^ 
where K is such that the probability of isolating a 
single SBD is 0.10 or higher. 

10 K = the smallest integer>= log3^o(*^"^^ N)/lo9"io (^ef f ) 

For example, if IT were 1.0 x 10*^ and C^ff = 6.31 x 10^ 
then logiQ(1.0 x 10^j/logio(^-31 x 10^) = 6.0000/2-8000. 
— 2.14. Therefore we would attempt to isolate SBDs 
15 . after the third separation cycle. After only two 
separation cycles, the probability, of finding an SBD is 
(6.31 X 10-2)2/(1^0 X 10*7) ^ ,04 and attempting to 
isolate SBDs might be profitable. 

20 Clonal isolates from the last fraction eluted in 

Sec, 15.3 containing any viable GPs, as well as; clonal 
isolates obtained by culturing an inoculum taken from ' 
the affinity matrix^ are cultured. If K separation 
cycles have been completed^ samples from a number, e.cr, 

25 32 f of these clonal isolates are tested for elution 
properties on the {target} column. If none of the 
isolated, genetically pure GPs show improved binding to 
target, or if K cycles have not yet been completed, 
■then we pool and culture, in a manner similar to the 

30 manner set forth in Sec. 14/3, the GPs from the last 
few fractions eluted (see Sec. 15.4) that contained 
viable GPs and from the GPs obtained by culturing an 
inoculum taken from the column matrix. We then repeat 
the enrichment procedure described in Sec * 15 . This 
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cyclic enrichment may continue N^hrom passes or until 
an SBD is isolated. 

If one or more of the . isolated GPs has improved 
retention on the {target}' coluinn, we determine whether 
the retention of the candidate SBDs isdue to affinity 
for the target material. Target material is attached 
to a different support matrix at optimal density . and 
the elution volumes of candidate GP( SBD) s are measured. 
We ' pick the candidate that either has the highest 
elution volume or that is retained on the column after 
elution. If hone of the candidate GP(SBD)s has higher 
elution volume than GP(PPBD of this round)/ then we 
pool and culture the GPs from the last few fractions 
that contained viable GPs and the GPs obtained by 
culturing an inoculum taken from the column matrix. We 
then repeat the enrichment procedure of Sec. 15. 

If all of the SBDs show binding that is superior 
to PPBD of this rounds we pool and' culture the GPs from 
the last fraction that contains viable GPs and from the 
.inoculum taken from the colvimn. This population is re- 
chromatographed at least one pass to fractionate 
further the GPs based on K^^* 

If an ENA phage were used as Gp, the RNA would . 
either be cultured with the assistance of a helper 
phage or be reverse transcribed and the DNA amplified. 
The amplified DNA could then be sequenced or subcloned 
into suitable plasmids. 

Sec, 15.7: Characterizing the Po-pulation; 



We characterize members of the population showing 
desired binding properties by genetic and biochemical 
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metiiods. We obtain clonal isolates and test these 
strains by genetic and affinity methods to determine 
genotype and plienotype with respect to binding to 
target. For several genetically pure isolates that 
5 show binding, we demonstrate that the binding is caused 
by the artificial chimeric gene by excising the oso-sbd 
gene and crossing it into the parental GP, We also 
ligate the deleted backbone of each CP from which the 
osp-sbd is removed and demonstrate that each backbone 
10 alone cannot confer binding to the target on the GP. 
We sequence the osp-sbd gene from several clonal 
isolates. 

Sec. 15,8; Testing- of binding- affinity; 

15 

For one or more clonal isolates^ we sxobclone the 
sbd gene fragment, without the osp fragment, into an 
expression vector such that each SBD can be produced as 
a free protein. Each SBD protein is purified by normal 

20 means, including affinity chromatography. Physical 
measurements of the strength of binding are then made 
on each free SBD protein by one of the following 
methods: 1) alteration of the Stokes radius as a 
function of binding of the target material, measured by 

25 characteristics of elution from a molecular sizing 
coloimn such as agarose, 2) retention of radiolabeled 
SBD on, a spun affinity column to which has been affixed 
the target material, or 3) retention of radiolabeled 
target material on a spun affinity column to which has 

30 been affixed the SBD, The measurements of binding for 
each free SBD are compared to the corresponding 
measurements of binding for the PPBD. 



In each assay;, we measure the extent of binding as 
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a function of concentration of each protein> and other 
relevant physical and chemical parameters. 

In addition, the SBD with highest affinity for the 
target from each round is compared to the best SBD of 
the . previous round (IPBD for the first round) and to 
the- IPBD with respect to affinity for the target 
material. Successive rounds of mutagenesis and 
select ion-through -binding yield increasing affinity 
until desired levels are achieved. 

If binding is not yet sufficient, we must decide 
which residues to vary next (see Sec. 16.0). 

Sec. 15.9: Other Affinity Separatiort Means; 

FACs may be used to separate GPs that bind 
fluorescent labeled target with the optimized 
parameters determined in Part II. We discriminate 
against artif actual binding to the fluorescent lable by 
using two or more different dyes, chosen to be 
structurally different. 

Electrophoretic affinity separation uses unaltered 
target so that only other ions in the buffer can give 
rise to artif actual binding. Artif actual binding to 
the gel material gives rise to retardation independent 
of field direction and so is easily eliminated. A 
variegated population of GPs will have a variety of 
charges. 

First the variegated population of GPs is 
electrophoresed in a gel that contains no target 
material. The electrophoresis continues until the GPs 
are distributed along the length of the lane. The 
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target" free lane in which the initial electrophoresis 
is conducted is separated by a removable baffle from a 
square of gel that contains target material. The 
baffle is removed and a second electrophoresis is 
5 conducted at right angles to the first. GPs that do - - 
not bind target migrate with unaltered mobility while 
GPs that do bind target will separate from the majority 
that do not bind target. A diagonal line of non- 
binding GPs will form. This line is excised and 
10 discarded. Other parts of the gel are dissolved and 
the GPs cultured. 

Sec. 16.0: The Next Variecration Cycle: 

15 Which residues of the PBD should be varied in the 

next variegation cycle? The general rule is to 
preserve as much accumulated information as possible - 
The amino acids just varied are the ones best 
determined. The* environment of other residues has 

2 0 changed^ so that it is appropriate to vary them again. 

Because there are always more residues in the principal 
and secondary sets than can be varied simultaneously, 
we start by picking residues that either have never 
been varied (highest priority) or that have not been 
25 varied for one or more cycles. If we find that varying 
all the residues except those varied in the previous 
cycle does not allow a high enough level of diversity, 
then residues varied in the previous cycle might be 
varied again. For example^ if -^^e number of 

3 0 independent transf ormants that can be produced ^nd ^l^e 

sensitivity of the affinity separation were such that 
seven residues could be varied, and if the principal 
and secondary sets contained 13 residues, we would 
always vary seven residues, even though that implies 
35 varying some residue twice in a row. In such cases, we 
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would pick the residues just varied .that contain the 
amino acids of highest abundance in the variegated 
codons used. 

5 It is the accumulation of information that allows 

the process to select those protein sequences that 
produce binding between the SBD and the target. Some 
interfaces between proteins and other molecules involve 
twenty or, more resiidues. Complete variation of twenty 

10 residues would generate 10^^ different proteins. By 
dividing the residues that lie close together in space 
into overlapping groups of five to- seven residues, we 
can vary a large surface but never need to test more 
■than 10*^ to 10^ candidates at once, a savings of 10^^ 

15 to 10^*^ fold. 

Having picked the residues to vary, we again set 
the range of variegation for each residue according to 
the principles set forth in 13.2, design the vgDNA 
20^ encoding the desired mutants (Sec. 13.3), clone the 
vgDNA into GPs (Sec. 14), and select-by-binding-to- 
target those GPS bearing SBDs (Sec. 15). 

Sec. 17 . 0 : OTHER CONSIDERATIONS ; 

25 

Sec. 17.1: Joint selections: . 

One may modify the affinity separation of the 
method described to select a molecule that binds to 

30 material A but not to material B. One needs to prepare 
two selection columns, one with material A and the 
other with material B. The population of genetic 
packages is prepared in the manner described, but 
before applying the population to A, one passes the 

35 population over the B column so as to remove those 
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iaembers of the population that have high affinity for 
It may be necessary to amplify the population that 
does not bind to B before passing it over A. 
Amplification would most likely be needed if A and B 
5 were in some ways^ similar and the PPBD has been 
selected for having affinity for A. 

For example, to obtain an SBD that binds A but not 
B^ three columns could be connected in series: a) a 

10 column supporting some compound neither A nor B, or 
only the matrix material^ b) a column supporting B;. and 
c) a column supporting A. A population of GP(vgPBD)s 
is applied to the series of coltimns and the columns are 
washed with the buffer of constant ionic strength that 

15 is used in the application. The columns are uncoupled, 
and the third coliamn is eluted with a gradient to 
isolate GPCPBD)s that bind A but not B. 

One can also generate molecules that bind to both 
20 A and B. In this case we use a 3D model and mutate one 
face of the molecule in question to, get binding to A. 
We then mutate a different face to produce binding to 
B, 

25 The materials A and B could be proteins that 

differ at only one or a few residues. For example, A 
could be a natural protein for which the gene has been 
cloned and B could be a mutant of A that retains the 
overall 3D structure of A. SBDs selected to bind A but 

30 not B must bind to A near the residues that are mutated 
in B, If the mutations were picked to be in the active 
site of A (assuming A has an active site) , then an SBD 
that binds A but not B will bind to the active site of 
A and is likely to be an inhibitor of A. 

35 
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To obtain a .protein that will bind to both A and 
B, we can, alternatively, first obtain an SBD that 
binds A and a different SBD that binds B. We can then 
combine the genes encoding these domains so that a two- 
5 domain single-polypeptide protein is produced. The 
fusion protein will have affinity for both A and B. 

One can also generate binding proteins with 
affinity for both A and B, such that these materials 
10 compete for the same site on the binding protein. We 
guarantee competition by overlapping the sites for A 
and B. We first create a molecule that binds to target 
material A. We then vary, a set of residues defined as: 
a) those residues that were varied to obtain binding to 
15 A, plus b) those residues close in 3D space to the 
residues of set (a) but that are internal and so are 
unlikely to bind directly to either A or B. Residues 
in set (b) are likely to make small changes in the 
positioning of the residues in set (a) such that the 
20 affinities for A and B will be changed by small 
amounts. Members of these populations are selected for 
affinity to both A and B. 

Sec. 17.2; Selection for non-bindincr; 

The method of the present invention can be used to 
select proteins that do not bind to selected targets. 
Consider a protein of pharmacological importance, such 
as streptokinase, that is antigenic to an undesirable 
extent. We can take the pharmacologically important 
protein as IPBD and antibodies against it as target. 
Residues on the surface of the pharmacologically 
important protein would be variegated and GP(PBD)s that 
do not bind to an antibody col;imn would be collected 
and cultured. Surface residues may be identified in 
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several ways/ including: a) from a 3D structure, b) 
from hydrophobicity considerations, or c) chemical 
labeling. The 3D striicture of the pharmacologically 
important protein remains the preferred guide to 
picking residues to vary, except now we pick residues 
that are widely spaced so that we leave as little as 
possible of the original surface unaltered. 

Destroying binding frequently requires only that a 
single amino acid in the binding interface be changed. 
If polyclonal antibodies are used, we face the problem 
that all or most of the strong epitopes must be altered 
in a single molecule. Preferably, one would have a set 
of monoclonal antibodies, or a narrow range of antibody 
species. If we had a series of monoclonal antibody 
columns, we could obtain one or more mutations that 
abolish binding to each monoclonal antibody. We could 
then combine some or all of these mutations in one 
molecule to produce a pharmacologically important 
protein recognized by none of the monoclonal 
antibodies. Such mutants must be tested to verify that 
the pharmacologically interesting properties have not 
be altered to an unacceptable degree by the mutations. 

Typically, polyclonal antibodies display a range 
of binding constants for antigen* Even if we have only 
polyclonal antibodies that bind to the 
pharmacologically important protein, we may proceed as 
follows. We' engineer the pharmacologically important 
protein to appear on the surface of a replicable GP. 
We .introduce mutations into residues that are on the 
sxirface of the pharmacologically important protein or 
into residues thought to be on the surface of the 
pharmacologically important protein so that a 
population of GPs is' obtained. Polyclonal antibodies 
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are attached to a column and the population of GPs is 
applied to the column at low salt. The. column is 
eluted with a salt gradient. The GPs that elute at the 
lowest concentration of salt are those which bear 
5 pharmacologically important proteins that have been 
mutated in a way that eliminates binding to the 
antibodies having maximum affinity for the 
pharmacologically important protein • The GPs eluting 
at thie lowest salt ^ are isolated and cultured. The 
10 isolated SBD becomes the PPBD to further rounds of 
variegation so ' that the antigenic determinants are 
successively eliminated. 

Sec> 17.3: Selection of PBDs for retention of 
15 structure; 

We can select for insertions or deletions that 
preserve the 3D structure of known binding proteins. 
Consider on GP that express BPTI on its surface. In 

20 the bpti"Qsp gene, we can replace the codons for K2 6 
and A27 with five variegated codons (3.2 x 10^ 
sequences). K26 and A27 are in a turn and are far from 
the trypsin binding surface. We use selection-through- 
binding to isolate GPs expressing mutants of BPTI that 

25 retain high, specific affinity for trypsin. 

Sec. 17.4; Created binding proteins not unicrue: 

For each target, there ar^. a large number of SBDs 
30 that may be found by the method of the present 
invention. To increase the probability that some PBD 
in the population will bind to the target, we generate 
as large a population as we can conveniently subject to 
selection- through -binding. Key questions in management. 
35 of the method are "How many transformants can we 
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produce?";, and "How small a component can we find 
through selection-through-binding?" • Geneticist's 
routinely find mutations with frequencies of one in 
10^*^ using simple^, powerful selections ♦ The optimum 
5 level of variegation is determined by the maximum 
number of " trans formants and the selection sensitivity, 
so that for any reasonable sensitivity we may use a 
progressive process to obtain a series of proteins with 
higher and higher affinity for the chosen target 
10 material. Enrichments of 1000-fold by a single pass of 
elution from an affinity plate have b^en demonstrated 
(SMIT85) . 

Use of different variation schemes can yield 
15 different binding proteins. For any given target, a 
large plurality of proteins will bind to it. Thus^ if 
one binding protein turns out to be unsuitable for some 
reason ( e.g. too antigenic) , the procedure can be 
repeated with different variation . parameters . For 
20 example^ one might choose different residues to vary or 
pick a different nt distribution at variegated codons 
so that a new distribution of amino acids is tested at 
the same residues. Even if the same principal set of 
residues is used^ one might obtain a different SBD if 
25 the order in which one picks sxibsets to be varied is 
altered. . 

Sec. 17.5: Other modes of mutacrenesis possible: 

3 0 The modes of creating diversity in the population 

of GPs discussed -herein are not the only modes 
possible. Any method of mutagenesis that preserves at 
least a large fraction of the information obtained from 
one selection and then introduces other mutations in 

3 5 the same domain will work. The limiting factors are 
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the niamber of independent transf ormants that can be 
produced and the amount of. enrichment one can achieve 
through affinity separation. Therefore the preferred 
embodiment uses a method of' mutagenesis that focuses 
5 mutations into those residues that are most likely to 
affect the binding properties of the PBD and are least' 
likely to destroy the underlying structure of the IPBD. 

Other modes of mutagenesis .might allow other GPs 
10 to be considered. For example, the bacteriophage 
lambda is not a useful cloning vehicle for cassette 
mutagenesis because of the plethora of restriction 
sites. One can, however, use single-stranded-oligo- 
nt-directed mutagenesis on lambda without the need for 
15 unique restriction sites. No one has used single- 
stranded-oligo-nt-directed mutagenesis to introduce the 
high level of diversity called for in the present 
invention/ but if it is possible, such a method would 
allow use of phage with large genomes. 



20 
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Example 1 



BPTI-Derived Binding Protein for HHMb; Displayed by M13 
5 Phage 

Presented below is a hypothetical example of a 
protocol for developing a new binding molecule derived 
from BPTI with affinity for horse heart myoglobin 

10 (HHMb) using the common coli bacteriophage K13 as 

genetic package. It will be understood that some 
further optimization^ in accordance with the teachings 
herein, may be necessary to obtain the desired results. 
Poss ible modifications in the , preferred method are 

15 discussed immediately following various steps of the 
hypothetical example. 

By hypothesis, we set the following technical 
capabilities: 



20 



500 ng/ synthesis of ssDNA 10 0 bases 
long, 



10 ug/synthesis of ssDlTA 60 bases long, 
1 mg/ synthesis of ssDNA 20 bases long. 



25 



^DNA 



100 bases 




1 mg/1 



30 



0.1 % for blunt-blunt, 
4 % for sticky-blunt, 
11 % for sticky-sticky. 



^ntv 



5 X 10^ 



35 
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900-fold enriGhment 
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Csensi 1 in 4 X 10 



^chrom passes 



^err 0.05 



8 



10 Example 1, Part X 

In this example,, we will use M13 as a replicable 
GP and BPTI as IPBD. In Part I, we are concerned only 
with getting BPTI displayed on the outer surface of an 

15 M13 derivative. • Variable DNA may be introduced in the 
osp-itibd gene, but not within the region that codes for 
the trypsin-binding region of BPTI. Once BPTI is 
displayed on the M13 outer surface of an M13 
derivative, we proceed to Part II to optimize the 

2 0 affinity separation procedures. 

For this example, we chopse a filamentous 
bacteriophage of E_j. coli, M13. We prefer phage over 
vegetative bacterial cells because phage are much less 

2 5 metabolically active. We prefer phage over spores 
because the molecular mechanisms of the virion 
formation and 3D structure of the. virion are much 
better understood than, are the corresponding processes 
of spore formation and structures of spores. 

30 - 

M13 is a very well studied bacteriophage, widely 
used for DNA sequencing and as a genetic vector; it is 
a typical member of the class of. filamentous phages. 
The relevant facts about M13 and other phages that will 

35 allow us to choose among phages are cited in Sec. 
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1.3.1. 

Compared to otlier bacteriophage, filamentous pliage 
xn general are attractive and M13 in particular is 
5 especially attractive because: 

1) the 3D structure of the virion is known, 

2.) the processing of the coat protein is well 
10 understood, . 

3) the genome is expandable, 

4) the genome is small, 

5) the sequence of the genome is known^ 



15 



20 



6) the virion is physically resistant to shear 
heat, cold, guanidinium CI, low; pH, and high salt, 

7) the phage is a sequencing vector so that 
sequencing is especially easy, and 

8) antibiotic-resistance genes have been cloned 
25 into the genome with predictable results (HXNE80) . 

Other criteria listed in Sec. 1*0 and 1,3 of the are 
also satisfied: X3:i3 is easily cultured and stored 
(FRITS5), each infected cell yielding 100 to 1000 M13 

3 0 progeny after infection. M13 has no unusual or 
expensive media requirements and is easily harvested 
and concentrated CSALI64, yAMA70, FRIT85) ; m3 is 
stable toward physical agents: temperature (10% of 
phage suirvive 3 0 minutes at 85*^C) , shear (Waring 

35 blender does not kill) , desiccation (not applicable) , 
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radiation (not applicable), age (stable for years). 

M13 is stable toward cliemicals: pH (< 2.2 
(SMIT85)), surface active agents: not applicable, 
5 chaotropes (guanidinium HCl = 6.0 M) , ions (no specific 
sensitivities), organic solvents (ether and other 
organic solvents are lethal (MAIIV78) ) , proteases (not 
appliccible, HHMb not & protease) . M13 is not known to 
be sensitive to other enzymes . , 

10 

MIS genome is 6423 b.p. and the sequence is known 
(SCHA78). . Because the genome is small, cassette 
mutagenesis is practical on RF M13 (AUSU87) , as is 
single-stranded oligo-nt directed mutagenesis (FRIT85) . 

15 M13 is a plasmid and transformation system in itself, 
and an ideal, sequencing vector. M13 can be grown on 
Rec" strains of E. coli . The M13 genome is expandable 
(MESS78, FRITS5) . M13 confers no advantage, but 
doesn't lyse ceils.. The sequence of gene Ylli is 
'20 known, and the amino acid sequence can be encoded on a 
■ synthetic gene, using lacUVS promoter and used in 
conjunction with the LaGl<3 repressor. The lacUVS 
promoter is induced by IPTG. Gene VIII .' protein is 
secreted by a well studied process and is cleaved 

25 between A23 and A24. Residues 18, 21, 22, and 23 of 
gene VIII protein control cleavage. Mature gene VIII 
protein makes up the sheath around the circular ssDNA. 
The 3D structure of f 1 virion is known at medium 
resolution; the amino terminus of gene VIII protein is 

30 on surface of the virion. No. fusions to M13 gene VIII 
protein have been reported. The 2D structure of M13 
coat protein is implicit in the 3D structure. Mature 
M13 gene VIII protein has only one domain. There are 
four minor proteins: gene ill, VI, VII, and XX. Each 

3 5 of these minor proteins is present in about 5 copies 
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per virion and is related to laorpliogenesis or 
infection. The major coat protein is present in more 
than 2500 copies per -virion - 

5 Although no fusions of M13 gene VII I to other 

genes have been reported, ]aaowledge of the virion 3D 
structure (BANNS 10) makes ^ attachment of IPBD to' the 
amino terminus of mature M13 coat protein (M13 CP) 
quite attractive. Shoxild direct fusion of BPTI to M13 
10 CP fail to cause BPTI to be displayed on the surface of 
.H13, vre will vary part of the BPTI sequence and/or 
insert short random DNA sequences between BPTI and M13 
CP, 

15 Smith (SMIT85) and de la Cruz et al. (CRUZ88) have 

shown that insertions into gene III cause novel protein 
domains to appear on the virion outer surface. If BPTI 
can not be made to appear on the virion outer surface 
by fusing the bpti gene to the ml3 cp gene, we will fuse 

20 bpti to gene III either at the site used by Smith and 
by de la Cruz et al. or to one of the termini. We will 
use a second, synthetic copy of gene III so that some 
unaltered gene III protein will be present. 

25 The gene YIII protein is chosen as OSP because it 

is present in many copies and because its location and 
' orientation in the virion are known. ITote that any 
uncertainty about the azimuth of the coat prptein about 
its own alpha helical axis is unimportant. 

30 

The 3D model of fl indicates strongly that fusing 
BPTI to the amino terminus of M13 CP is more likely to 
yield a functional protein than any. other fusion site. 
(See Sec. 1.3.3) - 
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The amino-acid sequence of M13 pre-coat (SGHA78) , 
called AA_seql, is ' 

AA_seql 

1 1 2 I |2 3 .3 4 4 5 • 

5 0 5 0 V5 0 5 0 5 0 
MKKSLVLKASVAVATLVPMLSFAAEGDDPAKAAFNSLQASATEYIGYAWA 

5 6 6 7 7 / 

5 0 5 0 3 

MVWIVGATIGIKLFKECFTSKAS ' 

The single-letter codes for amino acids and the codes 
for ambiguous DNA are internationally recognized 
(GEOR87) . The best site for inserting a novel protein 
domain into M13 CP is after A23 because SP-I cleaves 
the precoat protein after A23^ as indicated by the 
arrow. Proteins that can be secreted will appear 
connected to mature M13 CP at its amino terminus. 
Because the amino terminus of mature M13 CP is^ located 
on the outer surface of the virion, the introduced 
domain will be displayed on the outside of the virion. 

BPTI is chosen as IPBD of this example (See Sec, 
2.1) because it meets or exceeds all the criteria: it 
is a small, ; very stable protein with a well known 3D 
structure, Marks et al , (MARKS 6) have shown that a 
fusion of the phoA signal peptide gene fragment and DNA 
coding for the mature form of BPTI caused native BPTI 
to appear in the periplasm of coli y demonstrating 
that there is nothing in the structure of BPTI to 
prevent its being secreted. 

Marks et al, (MARK87) also showed that the 
structure of BPTI is stable even to the removal of one 
of the cystine bridges. They did this by replacing 
both C14 and C38 with either two alanines or two 
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thireonines. The C14/C3 8 cystine bridge that Marks et 
al . removed is the one very close to the scissile bond 
in BPTI; surprisingly, both mutant molecules 

functioned as trypsin inhibitors . This indicates that 
5 BPTI is redundantly stable and so is likely to fold 
into approximately the same structure despite numerous 
surface mutations. Using the knowledge of homologues^ 
vide infraV we can infer which residues must not be 
varied if the basic BPTI structure is to be maintained. 

10 

The 3D structure of BPTI has been determined- at 
high resolution by X-ray diffraction CHUBE77/ MARQS3, 
WLOD84^ WLODSTa, WLOD87b) , neutron diffraction 
(Vn:0D84) , and by NMR (WAGN87) . In one of the X-ray 

15 stiructures deposited in the Brookhaven Protein Data 
Bank, "6PTI", there was no electron density for A58, 
indicating that A5S has . no uniquely defined 
conformation. Thus we 'know that the carboxy group does 
not make any essential interaction in the folded 

20* structure* The amino terminus of BPTI is very near to 
the carboxy terminus. Goldenberg and Creighton 
reported on circularized BPTI and circularly permuted 
BPTI (GOIiD83) . Some proteins homologous to BPTI have 
more or fewer residues at either terminus* 

25 

BPTI has been called "the hydrogen atom of protein 
folding" and has been the subject of numerous 
experimental and theoretical studies (STAT87, SCHW87;. 
GOLDS 3, CHAZ83) . 

30 

BPTI has the added advantage that at least 32 
homologous proteins are known,, as shown in Table 13. A 
tally of ionizable groups is shown in Table 14 and the 
composite of amino acid types occurring at each residue 
35 is shown in Table 15. 
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BPTI is freely soluble and is not known to bind 
metal ions. BPTI has no known enzymatic activity. 
BPTI binds to trypsin, = 6.0 x 10"^^ K (TSCH87) 

5 BPTI is not toxic. If K15 of BPTI is changed to L, - 
there is no measurable binding between the mutant BPTI 
and trypsin (TSCH87) . 

All of the conserved residues are buried; of the 
10 seven fully conserved residue's only G37 has noticeable 
exposure. The -solvent accessibility of each residue in 
BPTI is given in Table 16 which was calculated from the 
entry' "6PTI" in the Brookhaven Protein Data Bank with a 
solvent radius of 1.4 A, the atomic radii given in 
15 Table 7, and the method of Lee and Richards (LEEB71) . 
Each of the 51 non- conserved residues can accommodate 
two or more kinds of amino acids. By independently 
substituting at each residue only those amino acids 
already observed at that residue, we could obtain 
20 approximately 7 x 10*^ different amino acid sequences, 
most of which will fold into structures very similar to 
BPTI. 

BPTI will be useful as a IPBD for macromolecules . 
25 (See Sec. 2 .1. 1) ♦ BPTI . and . BPTI homologues bind tightly 
and with high^ specificity to a number of enzymes. 

BPTI is strongly positively charged except at very 
high pH, thus BPTI is useful as IPBD for ' targets that 

3 0 are not also strongly positive under the conditions of 
intended use (see Sec. 2.1.2). There exist homologues 
of BPTI, however, having quite different charges ( viz , 
SCI-III from . Bombvx mori at -7 and the trypsin 
inhibitor from bovine colostrum at -1) . Once a 

3 5 derivative of M13 is found that displays BPTI on its ■ 
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surface, the sequence of tiie BPTI domain can be 
replaced by one of the homologous sequences to produce 
acidic or neutral XPBDs. 

5 BPTI is not an enzyme (See Sec. 2.1.3). BPTI is 

quite, small; if this should cause a pharmacological 
problemir. two or more BPTI-derived domains may be joined 
as in the human BPTI homologue that has two domains. 

10 A derivative of M13 is the preferred OCV. (See 

Sec. 3).. A "phagemid" is a hybrid between a phage -and 
a plasmid, and is used in this invention. Double- 
stranded plasmid DNA isolated from phagemid-bearing 
cells is denoted by the standard convention, e.cr. 

15 pXY24. Phage prepared from these cells would be 
designated XY24 ^ Phagemids such as Bluescript K/S 
(sold by Stratagene) are not suitable for our purposes 
because Bluescript does not contain the full genome of 
M13 and must be rescued by coinfection with helper 

2 0 phage. Such coinfections could lead to genetic 
recombination yielding heterogeneous phage unsuitable 
for the purposes of the present invention. 

The bacteriophage M13 bla 61 (ATCC 37039) . is 
25 derived from wild-type M13 through the insertion of the 
beta lactamase gene (HINE80) . This phage contains 8.. 13 
kb of DlsTA. M13 bla cat 1 (ATCC 37040) is derived from 
M13 bla 61 through the additional , insertion of the 
chloramphenicol resistance gene (HINE80) ; M13 bla cat .1 
30 contains 9.88 kb of DNA. Although neither of these 
variants of M13 contains the ColEl origin of 
replication, either could be used as a starting point 
to construct a usable cloning vector for the present 
example. • 

35 



wo 90/02809 



PCr/US89/03731 



131 

The OCV for the current example is constructed by 
a process illustrated in Figure A. A brief description 
of all the plasmids and phagemids constructed for this 
Example is found in Table 17 . 
5 ■" ■ ' . ' ■ 

For ss oligo-nt site-directed mutagenesis , 
multiple primers lead to higher efficiency. Three non- 
mutagenic primers, aire used: bases 2326-2352 of wt M13, 
bases 4854-4875 of wt M13, and the complement of bases 
10 . 3431-3451 of . pBIl322. Note that pLG2 and its 

derivatives carry the anti-sense strand of the amp ^ 
gene in the 4- DNA strand, . The segments are picked to 
be high in GC content and to divide the pLG7 genome 
into several segments of approximately equal length. 

15 

The genetic engineering procedures needed to 
construct the OCV are standard, using commercially 
available restriction enzymes under recommended 
conditions. All restriction fragments of DNA are 

.20 purified by electrophoresis or HPLC, * M13 and its 
engineered derivatives are infected into coli strain 
PE3S4 (F"^,Rec", Sup"^,Amp^) . Plasmid DNA of. M13 

derivatives is transformed into coli strain PE383(F'^ 
/Rec", Sup"^,Amp^) so that we avoid multiple rounds of 

25 infection in the culture. Isolation of M13 phage is by 
the procedure of Salivar et al . (SALI64) ; isolation of 
replicative form (RF) M13 is by the procedure of 
Jazwinski et al. '(JAZW73a and JAZW73b) . Isolation of 
plasmids containing the ColEl origin of replication is 

30 by the method of Maniatis (MANI82) . 

We pick the amp- ^. gene' from pBR3 22 as a convenient 
antibiotic resistance gene. Another resistance gene, 
such as kanamycin, could be used. The Age I-to-Aat II 
35 fragment of pBR322 is a conveniently obtained source of 
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anv^ and the Col El origin. 

M13iupl8 (New England BioLabs) contains neither Aat 
II nor Acc I sites. Therefore ve insert an adaptor 
5- that allows us to insert the Aat II-to-Acc I fragment 
of pBR322 that carries the amp^ gene and the ColEl 
origin of replication into a desirable place in 
M133npl8. MlSinplS Contains a lacUV5 promoter and a lacZ 
gene that are not useful to the purposes of the present 
IQ . invention. By cutting M13inpl8 with Ava il and Bsu3 6 I 
and discarding the approximately 600 intervening base 
pairs, we eliminate all recognition sites of several 
enzymes useful for engineering the bpti-gene VIII gene. 

15 The following adaptor is synthesized, 

5 ' GACCGACGTCtgcctcGTATACCGGACCGcatagctCC 3 » olig#l 
3 • GCTGCAGacggagCATATGGCCTGGCgtatcgaGGACT 5 ' olig#2 
' Avail |- Aat II [ | AccI | RsrII | | Bsu3 61 

20 

The annealed adaptor is ligated with HF K13mpl8 
that has been cut with both Ava il and Bsu3 6 I and 
purified by PAGE or HPLC. Transformed cells are 
25 selected for plasmid uptake with ampicillin. The 
resulting construct is called pLGl, 

DNA from pliGl is cut with both Aat II and Acc I. 
Aatl l-to- Acc I fragment of pBR3 22 is ligated to the 
3 0 backbone of LGl. The correct construct is named pLG2 . 

The Acc I restriction site is no longer needed for 
vector construction. To eliminate this site, RF pLG2 
dsDNA is cut with Acc I, treated with Klenow fragment 
35 and dATP and dTTP to make it blunt and then religated. 
The cloning vector, named pLG3, is now ready for 
stepwise insertion of the osr>-it)bd gene. 



We are now ready to, design a gene (See Sec* 4) 
that will cause BPTI-doinains to appear on the. outer 
surface of an M13. derivative: LG7. 

To obtain a novel protein domain attached to the 
outside of M13, we insert DNA that codes for mature 
BPTI after A23 of the precoat protein of M13 • Mature 
BPTI begins with an arginine residue, which is charged; 
cleavage by sigiial peptidase I is normal in such cases. 
Signal peptidase I (SP-I) cuts a chimera of M13 coat 
protein and BPTI after A23 leaving mature BPTX attached 
at its carboxy end to the amino tenrminus of M13 CP. 

The following amino-acid sequence, called AA_seg2 , 
is constructed, by -inserting the sequence for mature 
BPTI (shown underscored) imonediately after the signal 
sequence of H13 precoat protein (indicated by the 
arrow) and before the sequence for the M13 CP* 

AA_seg2 

1 1 2(12 3 ,3 4 4 5 

5 0 5 0 V5 0 5 0 5 0 

MKKSLVI/K?^S VAVATLVPMLS F AEPDFCLEPPYTGPCKAHIIRYFYNAKA 



5 6 6 7 7 8 8 9 9 ■ ^ 10 

5 0 5 0 5 0 5 0 5 0 

GLCOTFVYGGCRAKRNNFKSAEDCMRTCGGAA EGDDPAKAAFNSLQASAT 

10 11 11 12 12 13 
5 0 5 0 5 0 

EYIGYAWAMWVIVGATIGIKLFKKFTSKAS 

Sequence numbers of fusion proteins refer to the 
fusion, as coded, unless otherwise noted. Thus the 
alanine that begins M13 CP is referred to as "number 
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82", "niunber 1 of M13 CP", or "nxxniber 59 of the mature 
BPTI-M13 CP fusion". 

Tlie osp-iiDbd gene is regulated by . the lacUVS 
5 promoter and teiuiinated by the trpA transcription 
terminator. The host strain of E^. coli harbors the 
lacl5 gene. The osp-ipbd gene is expressed and 
processed in parallel with the wild-type gene VXII. 
The novel protein/ that consists of BPTI tethered to a 
10 M13 CP domain, constitutes only a fraction of the coat. 
Affinity separation is able to separate phage carrying 
only five or six copies * of a molecule that has high 
affinity for an affinity matrix (SMITSS) ; 1% 
incorporation of the chimeric protein results in about 
15 30 copies of the protein exposed on the surface* If 
. this is insufficient/ additional copies may be provided 
by, for example, increasing IPTG. 



A model comprising M13 coat, after the model for 
20 fl of Marvin and colleagues (BANITSI) ^ and a BPTI 
domain, taken from the Broolchaven Protein Data Bank 
entry "6PTI", was constructed by standard model 
building methods that insure that covalent bond lengths 
and angles are close to acceptable values. The model 
25 shows that the " fusion protein could fit into the 
supramolecular structure in a stereochemically 
acceptable fashion without disturbing the internal 
structure of either the MIS CP or BPTI domain. 

3 0 The ambiguous DNA sequence coding for AA_seq2, is 

examined by a computer program for places where 
recognition sites for restriction enzymes could be 
created without altering the amino-acid sequence. (See 
Sec. 4.3). A master table of enzymes is compiled from 

35 the catalogues of enzyme suppliers. The enzymes that 
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(Preferably constructed as' 

Using the procedure given in Sec. 4,3, we design a 
5 ipbd gene^ such as that shown in Table 25. Some 
restriction enzymes ( e.a> Ban I or Hph I) cut the OCV 
too often to be of value. 

The entire, DNA sequence of the nl3cp-bpti fusion 
10 with annotation appears in Table 2 5 showing the useful 
restriction sites and biologically important features, 
viz . the lacXJVS promoter, the lacO operator, the Shine- 
Dalgarno sequence, the amino acid sequence, the stop 
codons, and the transcriptional terminator. 
15 , 

The ipbd gene is synthesized in several steps 
using the method described in Sec. 5*1, generating 
dsDNA fragments of 150 to 190 base pairs. 

20 The four steps (See Sec^ 6.1) by which we clone 

synthetic fragments of the mlScp-bpti gene (the osp- 
ipbd gene of the present example) into^ pLG3 and its 
derivatives are illustrated in Figure 5 . 

25 The sequence to be introduced into pLG3 comprises 

a) the segment from Rsrll to Avr ll (Table 25), b) a 
spacer sequence (gccgctcc) , and c) the segment from 
AsuII to Sau l. The segment is 158 bases long and is 
synthesized . from two shorter synthetic oligo-nts as' 

3 0 described in Sec. 5.1 of the generic specif icat ion. 

Table 27 shows the antisense strand of the 
sequence to be inserted. The 99 base fragment shown in 
upper case letters and underscored (5'- 
35 CCGTCG CCTTCG-3' - olig#3) is synthesized in the 



do not cut the OCV. 
described above) . 



wo 90/02809 



PCr/US89/03731 



136 

standard manner. Similarly, the 100 base long fragment 
of the "sense strand sho^n in lower case (5'- 
cgctca.. . /.aattg-3 * — olig#4) is synthesized* After 
annealing, the double-stranded region is extended with 
5 Klenow fragment by. the procedure given above to make 
the entire 176 bases double stranded. The overlap, 
region is 23 base pairs, long and contains 14 CG pairs 
and 9 AT pairs. The DNA between Avr ll and AsuIZ does 
not code for anything in the final pbd gene; it is 

10 there so that the DNA can be cut by both Avrll and 
AsuII at the same time in the nej^t step. Eight bases 
have been added to the left of Rsr XI and nine bases 
have been added to the left of Sau l (same specificity, 
and cutting pattern as Bsu36I) . These bases at the 

15 ends are not part of the final product ; they must be 
present so that the restriction enzymes can bind and 
cut the synthetic DNA to produce specific sticky ends. 

The synthetic DNA is cut with both Sau l and Rsr XI 
20 and is ligated to similarly cut dsDNA of pLG3 . The 
construct with the correct insert is called pLG4, 

The second step of the construction of the OCV is 
illustrated in Table 28. As in the construction of 

25 pIjG4, two pieces of single-stranded DNA are 
synthesized: a 99 base long fragment of the 'ant i -sense 
strand ending with p25 and a 99 base long fragment 
(starting with pl8) . Both the synthetic dsDNA and dsRF 
PI5G4 DNA are cut with both AvrX I and. Asu II and are 

3 0 ligated and used to transform coli > The construct 
carrying this second insert is called pLG5. 

Construction of pLG6 proceeds similarly to the 
construction of pLG5 , The sequence is shown in Table 
35 30. The two single stranded segments (one from the 
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anti--sense strand ending with N66 and the other from 
the sense strand starting with the third base of the 
codon for Y58) are synthesized/ annealed, and extended 
with Klenow fragment • ^Both the synthetic DNA and RF 
pLG5 are cut with both BssH I and Asuli, purified, and 
the - appropriate pieces are ligated and used to 
transform coli. 

The construction of pLG7. is illustrated ±n Table 
32 and proceeds, similarly, to the constructions of pLG4, 
pLG5, and pLG6; The two single stranded segments (one 
from the anti-sense strand ending with the first base 
of the codon for VllO and the other beginning with 
ElOl) are synthesized, annealed, and extended with 
Klenow fragment. Both the synthetic DNA and RF pLG6 
are cut with both Bbel and AsuII, purified, and the 
appropriate pieces are ligated and used to transform E. 
coli. The construct with the correct fourth insert -is 
called pLG7 ; the display of BPTI on the outer surface 
of LG7 is verified by the methods of Sec, 8. 

M:i3am42 9 is an amber mutation of M13 used to 
reduce non-specific binding :by the affinity matrix for 
phages derived from ■ MIS* M13am429 is derived by 
standard genetic methods (M1LL72) from wtM13 . 

Phage LG7 is, grown on E_s. coli strain PE384 in LB 
broth with various concentrations of IPTG added , to the' 
medium to induce the osp-ipbd gene. Phage LG7 is 
obtained from cells grown with O.O., 0.1, 1.0, 10.0 or 
100*0 uM, or 1,0 mM IPTG, harvested (See Sec. 7) by the 
method of Salivar (SALI64),, and concentrated to obtain 
a titre of 10^2 pfu/ml by the method of Messing 
(MESS83) . ' 
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The px'eferred inetliod of determining whether LG7 
displays BPTI on its surface (See Sec. 8) is to 
determine whether these phage can retain a . labeled 
derivative of trypsin C^rp) or anhydrotrypsin (AHTrp) 
5 on a filter that allows passage of unbound trp or 
AHUrp. Trypsin contains 10 tyrosine residues and can 
be iodinated with ^25j -^y standard methods; we denote 
the labeled trypsin as "trp*". Labeled anhydrotrypsin 
is denoted as "AHTrp*". Other types of labels can be 

10 used on trp or AHTrp^. e.g. biotin or a fluorescent 
label. AHTrp* or trp* is labeled to an activity of 0,3 
uCi/ug. A sample of 10^^ LG7(10 mM IPTG) is mixed with 
1.0 ug of trp* or AHTrp* in 1.0 ml of a buffer of 10 inM 
KCl, adjusted to pH 8,0 with 1 itiM K2HPO4 / KH2PO4. The 

15 mixture is passed through an Amicon MSPl system fitted 
with a membrane filter that allows passage of proteins 
smaller that M^- = 300,000. Filters are soaked in 
buffer containing trp or AHTrp prior to the analysis. 
The filter is washed twice ' with 0.5 ml of buffer 

2 0 containing trp or AHTrp. The radioactivity retained on 

the filter is quantitated with a scintillation counter 
or other suitable device. If each virion displays one 
copy of BPTI, then .05 ug of protein can be bound that 
would give rise to 3 x 10"^ disintegrations / minute on 
25 the filter. 

An alternative way to quantitate display of BPTI 
on the surface of LG7 is to use the stoichiometric 
binding between trypsin and BPTI to titrate the BPTI. 

3 0 A solution that titers 10^^ pfu/ml of a phage is 

approximately 1.6 x 10"9 M in. phage if each virion is 
infective. The ratio of pfu to total phage can be 
determined spectrophotometrically using the molar 
extinction coefficients at 2 60 nm and 280 nm corrected 
35 for the increased length of LG7 as compared to wtM13 . 
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For examplev if a 1.0 ml solution that contains 10^^ 
pfu of LG7 phage grown with 1.0 mM IPTG inhibits 
trypsin solutions up to 4-8 x 10"'^ M, we calculate that 
there -are approximately 300 BPTIs/GP f i,e. (4.8 x 10^'^ 
5 molecules of BPTI/1)/(1.6 x 10*^ phage/1) ) . Inhibition 
of a specified . concentration of trypsin is most easily 
measured spectrophotometrically using a peptide-linked. 
dye, such as Naipha"^^^^^^!""^^?"^^^ (TSGH87) . 

10 Alternatively, binding to an affinity column may 

he used to demonstrate the presence of BPTI on the 
surface of phage LG7. An affinity column of 2,0 "ml 
total volume having BioRad Affi-Gel loC^M) xaatrix and 
3 0 mg of AHTrp as iaffinity material is prepared by the' 

15 method of BioRad. The void volume (Vy) of this column 
is, by hypothesis, 1.0 ml. This * affinity column is 
denoted {AHTrp}. 

A sample of 10^^ M13am429 is applied to {AHTrp} in 

20 1,0 ml of 10 mM KCl buffered to pH 8,0 with KH2PO4 / 
K2HPO4. The column is then washed with the same buffer 
until the optical density at 280 nm of the effluent 
returns to base line or 4 x Vy have been passed through 
the colTomn, whichever comes first. Samples of L.G7 or 

25 LGIO are then applied to the blocked {AHTrp} column at 
10^2 pfu/ml in 1.0 ml of the same buffer. The column; 
is then washed again with the same buffer until the 
optical density at 280 nm of the effluent returns to 
base line or 4 x Vy have been passed through, whichever 

3 0 . comes first. Following this wash, a gradient of KCl 
from 10 mM to 2 .M in - 3 x Vy, buffered to pH 8-0 with , 
phosphate is passed over the column. The first KCl , 
gradient is followed by a KCl gradient running from 2 M 
to 5 M in 3 X Vy. The second KCl gradient is followed 

35 by a gradient of guanidinium CI from 0. 0 M to 2.0 M in 
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2 X Vy in 5 H KCl and buffered to pH 8*0 wit:h. 
phosphate • Fractions of 50 ul are collected and 
assayed for phage by plating 4 ul of each fraction at 
suitable dilutions on sensitive cells. Retention of 
5 phage on the coluirm is indicated by appearance of LG? 
phage in fractions that elute significantly later from 
the column than control phage LGIO or wtMlS . A 
successful isolate of LG7 that displays BPTI Is 
identified^ the bpti insert and junctions are 
10 sequenced, and this isolate is used for further work 
described below ^ 

If vgDHA is used to obtain a functional fusion 
between a BPTI mutant and M13 CP (vide infra ) ^. then DNA 

15 from a clonal isolate is sequenced in the regions that 
were variegated. Then gratuitous restriction sites for 
useful restriction enzymes are removed if possible by 
silent codon changes. The sequence numbers of residues 
in OSP-IPBD will be changed by any • insertions; 

20 hereinafter^ we will, however, denote residues inserted 
after residue 23 as 23a, 23b, etc . Insertions after 
residue 81 will be denoted as 81a, 81b, etc. This 
. preserves the numbering of residues between C5 and C55 
of BPTI. Residue C5 of BPTI is always denoted as 28 in 

25 the fusion; residue C55 of BPTI is always denoted as 78 
in the f us ion , and the intervening res idues have 
constant niimbers. 

Should LG7 phage from cells grown with 10 mM IPTG 
30 fail to display BPTI on its surface, we have several 
options. We might try to determine why the 
construction failed to work as expected. There are 
various possible modes of failure , including : a) BPTI 
is not cleaved from, the M13 signal sequence, b) BPTI is 
35 cleaved from the M13 CP, and c) the chimeric protein is 
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made and cleaved after the signal sequence, but the 
processed protein is not incorporated into the K13 
coat, BPTI has been secreted from E^. coli (MARKS 6) ; 
however the M13 coat-protein signal sequence was not. 
used. Therefore problems stemming from the signal 
sequence are unlikely, but possible. We could 
determine whether BPTI was present in the periplasm or 
bound to the inner membrane of LG7-- infected cells by 
assays using try* or Antry* . 

Proteins in the periplasm can be freed through 
spheroplast formation using lysozyme and EDTA ' in' a 
concentrated sucrose solution (BXRD67, MAIiA64) . If 
BPTI were free in the periplasm, it would be found in* 
the supernatant. Try* would be mixed with supernatant 
and passed over a non-denaturing molecular sizing 
column and the radioactive fractions collected • The 
radioactive fractions would then be analyzed by SDS- 
PAGE ' and examined for BPTI-sized bands by silver 
staining, 

Spheroplast formation exposes proteins anchored in 
the inner membrane. Spheroplasts are mixed with AHTrp* 
and then either filtered or centrifuged to separate 
them from unbound AHTrp* ♦ After, washing with 
hypertonic buffer, the spheroplasts are analyzed .for 
extent of AHTrp* binding alternatively, membrane 
proteins are analyzed by western blot analysis. 

If BPTI is found free in the periplasm, then we 
would expect that the chimeric protein was being 
cleaved both between BPTI and the M13 mature coat 
sequence and between BPTI and the signal sequence. In 
that case, we should alter the BPTI/M13 CP junction by 
inserting vgDNA at codons for residues 78-82 of 
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AA_seq2 . 

If BPTI is found attached to the inner membrane, 
then there are two likely explanations* The first is 
5 that the" chimeric protein is being cut after the signal 
sequence, but is not being incorporated into LG7 
virion; the treatment would also be to insert VgDNA 
between residues 78 and 82 of AA_seq2. The alternative 
hypothesis is that BPTI could fold and react with 

10 trypsin even if signal sequence is not cleaved, H- 
terminal amino acid sequencing of tryp sin-binding 
material isolated from cell homogenate determines what 
processing is occurring- If signal sequence were being 
cleaved 1^ we would use the procedure above to vary 

15 residues between C78 and AS2; siibsequent passes would 
add residues after residue 81. If signal sequence were 
not being cleaved, we would vary residues between 23 
and 27 of AA_seg2, Subsequent passes through that 
process would add residues after 23. 

20 

If BPTI were found neither ^in the periplasm nor on 
the inner membrane, then we would expect that the fault 
was in the signal sequence or the signal-sequence-to- 
BPTI junction* The treatment in this case would be to 
25 vary residues between 23 and 27. 

Several experiments that introduce variegation 
into the b-pti-aene VIII fusion are possible, including: 

30 1) 3 variegated codons between residues 78 and 82 ' 

using olig#12 and olig#13, 

2) 3 variegated codons between residues 23 and 27 
using olig#14 and olig#15. 



35 
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3) 5 variega-bed codons between residues 78. and 82 
using olig#13 and olig#12a, 

4) 5 variegated codons between residues 2 3 and 27 
using olig#15 and olig#14a, 

5) 7 variegated codons between residues 78 and 8 2 
using olig#13 and plig#12b^ and 

6) 7 variegated codons between residues 23/ and 27 
using olig#15 and olig#14b. 

To alter the BPTI-M13 CP junction, we introduce 
DNA variegated at codons for residues between 78 and 82 
into the Sgh I and Sfi I sites of pLG7. The residues 
after the last cysteine a;re highly variable in amino 
acid sequences homologous to BPTI, both in composition 
and length; in Table 25 these residues are dexioted as 
G79, GBOr and A81. The first part of the M13 CP is 
denoted as A82, E83, and G84- One of the oligo-nts 
olig=ffl2, dlig#12a, or olig#12b and the primer olig#13 
are synthesized by standard methods. The oligo-nts 
are: 

residue 75 76 77 78 79 80 81 82 83" 
5' gc|gaglcGC|ATGlCGTlACC|TGClqflc|qf3c|gfk|GCT|GAA|" 

84 85 86 87 88 89 90 91 
GGT I GAT I GAT [ CCG 1 GCC j AAA I GCG I GCC I gcg I cc 3' olig#12 

residue 75 76 77 78 79 80 81 81a 81b 
5 • gc t gag [ cGC [ ATG ] CGT | ACC ( TGC | qf k | qf k | qf k | qf k | qf k | - 

82 83 84 85 86 87 
GCT ) GAA I GGT I GAT I GAT I CCG I - 

88 89 90 91 
GCC] AAA I GCG I GCC I gcg I cc 3' olig#12a 
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residue 75 76 77 78 79 SO 81 81a 81b 
5 » gc I gag | cGC | ATQ ] CGT | ACC | TGC | qf 3c | gf k | gf k | gf k | gf 3c | - 

Sic Bid 82 83 84 85 86 87 
gf 3c I gf 3c \ GCT | GAA j GGT | GAT ) GAT | CCG j - 

88 89 90 91 
GCCjAAA[GCG|GCClgcg|cc 3* olig#12b 



residue 91 90 89 88 87 86 
5« gglcgclGGC|CGCjTTTlGGCjCGG|ATC 3^ olig#13 



where q is a mixture of (0-26 T^. O.ISC^ 0.26 A, and 
0,30 G) , f is a mixture of (0,22 T, 0.16 0.40 A/ and 
0.22 G) , and 3c. is a mixture of egual parts of T and G* 

2 0 The bases shown in lower case at either end are spacers 

and are not incorporated into the cloned gene. The 
primer is complementary to the 3 ' end of each of the 
longer oligo— nts. One of the variegated oligo-nts and 
the primer olig#13 are combined in ec[uimolar amounts 
25 and annealed. The dsDNA is completed with all four 
'(nt)TPs and Klenow fragment- The resulting dsDNA and 
RF pLG7 are cut with both Sfi I and S-ph 1, purified; 
mixed, and ligated. This ligation mixture goes through 
the process described in Sec. 15 in which we select a 

3 0 transformed clone that, when induced with IPTG, binds 

» 

AHTrp. 

To vary the junction between M13 signal sequence 
and BPTI, we introduce DNA variegated at codons for 

35 residues between 23 and 27 into the Kpn I and Xho I 
sites of pLG7. The first three residues are highly 
variable in amino acid sequences homologous to BPTI. 
Homologous, seguences also vary in length at the amino 
terminus. One of the oligo-nts olig#14/ olig#14a, or 

40 olig#14b and the primer olig#15 are synthesized by 
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standard methods. The oligo-iits are: 

residue : 17 18. 19 20 21 22 23 24 25 

5 5' g|gcc|gcG|GTA|CCGjATG(;CTG|TCTjTTT[GCTjgf3c|gfk|- 

26 27 28 29 30 
I gf k j TTC I TGT j CTC | GAG ( cgc | ccg | cga | 3 ' plig#14 

10 

residue 17 18 . 19 20 21 22 23 24 25 26 
5 « g [ gcc | gcG | GTA j CCG | ATG | CTG | TCT j TTT j GCT [ gf k | gf k | gf k | - 

15 26a 26b 27 28 29 30 

jgfk I gfkj TTC I TGT I CTC I GAG I cgc I ccg I cga I 3' olig#14a, 

20 residue 17 18 19 20 21 22. 23 24 25 26 

5 ' g I gcc | gcG ] GTA | CCG | ATG ] CTG | TCT J TTT j GCT | gf k | gf k | gf k ] - 

26a 26b .26c 2 6d 27 28 29 30 

I gf k 1 gf k | gf k | gf k | TTC | TGT | CTC ( GAG [ cgc | ccg | cga | 3 « ol ig# I4b 

25 

5 ' I teg I egg j gcg | CTC | GAG j ACA | GAA | 3 ' olig#15 

30 where g is a mixture of (0,26 T^. 0.18 C, 0.26 A, and * 
0.3Q G) , f is a mixture of (0.22 T/ 0,16 C, 0.40 A, and 
0.22 G) , and k is a mixture of egual parts of T and G. 
The bases shown in lower case at either end are 
spacers. ' One of the variegated oligo-nts and the 

35 primer are combined in eguimolar amounts and annealed. 
The ds DNA is completed with all four (nt)TPs and 
Klenow fragment. The resulting dsDNA and RF pLG7 are 
cut with both Kr>n I and Xho I,- purified, mixed, and 
ligated. This ligation mixture goes through the 

40 process described in Sec. 15 in which we select a 
transformed clone that, when induced with IPTG, binds 
AHTrp or trp. . ^ 
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If none of these approaches produces a working 
chimeric protein^ we may try a different signal 
sequence^ or a different QSP in M13 (e.g.^ the gene III ' 
protein for which there is fusion, data (SWLTBB, 
5 CRUZ88)) , or another genetic package. 

Example 1. Part II * 

BPTI binds very tightly to trypsin 
10 (Kjj — 6*0 X 10"^^ M) and to anhydrotrypsinj. so that 
these molecules are not preferred for optimizing the 
amount of BPTI to display on LG7 or the amotint of 
affinity molecule to attach to the column* Tschesche 
et al . reported on the binding of several BPTI 
15 derivatives to various proteases: 

Dissociation constants for BPTI derivatives. Molar, 

Residue Trypsin Chymotrypsin 
#15 (bovine (bovine 
20 pancreas) pancreas) 

lysine 6.0 x 10"^^ 9.0 x 10"^ 

glycine - 

alanine + - 

valine 

25 leucine - - 

From the report of Tschesche et al , we infer that 
molecular pairs marked have K^s greater than 

3*5 X 10"^ M and that molecular pairs marked have 
30 K^^s much greater than 3.5 x 10"^ M. Because of ^:he 
wealth of data about the binding of BPTI and various 
mutants to trypsin and other proteases (TSCH87) , we can 
proceed in various ways^ (For other PBDs we can obtain 



Elastase 
(porcine 
pancreas) 

+ 

2.8 X 10*3 
5.7 X 10"S 

1.9 X 10"*^ 



Elastase 
(human 
leukocytes) 

3.5 X 10"^ 

7.0 X ID"^ 
2-5 X 10""^ 

1.1 xl0"^0 
2,9 X 10^^ 
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two different monoclonal antibodies,, one with a . high 
affinity, having of order lO""^^ M, and one with ' a 
moderate affinity having on the order of 10""^ M. ) 
In this example, we may use: a) - the moderate binding 
5 between . BPTI and hirrtian leukocyte elastase (HuLEl) / b) 
the moderately strong binding of porcine elastase to 
BPTI(V15), or c) the binding of BPTI(A15) (residue 38 
in the ptod gene) for trypsin (weak but detectable) or 
for porcine pancreatic elastase. 

10 . 

We compare the retention of LG7 virions to the 
retention of wild-type M13 on {AHTrp}* M13 derivatives 
having more DNA than wild-type M13 have corresponding 
longer virions* Thus we will create pLG8 that differs 

15 from pLG7 only in having stop codons at codons 2 and , 
3, and an altered L codon at codon 7 of the os-p-lpbd 
gene. Phage LG8 will have exactly as much DNA as LG7; 
therefore the LG8 virion is exactly as long as the LG7 
virion. LGS can not, however, display BPTI on its 

20 surface. 

To expedite identification of different M13- 
derived phage, we replace the amp^ gene of LGS with the 
tet^ gene from pBR322 by standard methods. The BSM I- 
25 "to- Aat ll tet^ bearing fragment of pBR322 is ligated 
into DNA from pLGS cut with Xba l and Aat ll. The 
correct construction, having 9.2 kb, is easily 
distinguished from pBR322 and is called LGIO, 

30 The phage LG7 is grown at various levels of IPTG' 

in the mediiam and hairvested ■ in the way previously 
described. An affinity column having bed volume of 2.0 
ml and supporting an amount of HuIjEI picked from the 
range 0.1 mg to 3 0.0 mg on 1 ml of BioRad Affi- 

35 Gel 10 (™) or 'Affi-Gel 15 (™) is designated {HuLEl}. 
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An appropriate set of densities of HuLEl on the column 
is (0.1 mg/ml, 0,5 mg/ml, 2.0 mg/ml, 8.0 mg/ml, 15.0 
mg/ml, and 30.0 mg/inl) The Vy of {HuIiEl} is, by 
hypothesis, 1.0 ml. The elution of LG7 phage is 
5 compared to the elution of LGIO on {HuLEl} having 
varying amounts of HuLEl affixed. The columns are 
eluted in a standard way: 

1) 10 mM KCl buffered to pH 8.0 with phosphate, 
10 until optical density at 280nm falls to base line 

or 4 X Vy, whichever is first, 

2) a gradient of 10 mM to 2 M KGl in 3 x Vy, pH 
held at 8.0 with phosphate, 

15 

3) a gradient of 2 M to 5 M KCl in 3.x Vy, 
phosphate buffer to pH 8.0, 

4) constant 5 M KCl' plus 0 to 0 .8 M guanidinium. CI 
20 in 2 X Vy, with phosphate buffer to pH 8.0. 

The preferred level of induction ( IPTGoptimal ) 
amount of affinity molecule on the matrix 
CDoAM:oM(;jptimai) are those settings that give the 

25 sharpest LG7 elution peak that shows significant 
retardation as compared to.LGS, which carries no BPTX» 
By hypothesis, the best separation occurs for the 
amount of BPTI/GP produced when the cells are induced 
with 10.0 uM XPTG and when 4.0 mg HuLEl/ml is applied 

3 0 to BioRad Affi-Gel 10 (™) . 

/■ 

When the amount of BPTI/GP and the amount of 
HuLEl/volume of support have been optimized, we turn to 
optimization of elution rate, initial ionic strength, 
35 and the amount of GP/ (volume of support). These 
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Using optimal BPTI/GP and HuLEl/volume of support, 
we measure the - elution volume of LG? and LG8 for 
5 different elution rates, viz . 1, 1/2, 1/4, 1/8 and 1/16 
times the inaximum flow rate. By hypothesis, i/4 of 
maximum elution rate is better than 1/2, but 1/8 is 
about the same as 1/4 ♦ * Therefore 1/4 maximum elution 
rate will be used. . * 

10 

Elution v6lTames of LG7 obtained from cells grown 
on media that is 2.0 mM in IPTG are measured at optimal 
DoAMoM and elution rate for loadings of 10^, 10^^, 
10^^, and 10^^ pfu- By hypothesis, 10^^ pfu of pure 

15 LG7 overloads the colu3nn and significant number of 
phage elute before their characteristic position in the 
KCl gradient* We also find that 10^^ pfu overloads the 
column only slightly, and that 10^^ pfu does not 
overload the column. Because the use of the affinity 

2 0 separation in Sec- 15 will involve a population in 
which no single member is more than one part in 10^, we 
conclude that 10^^ pfu of a variegated population could 
be applied to a column of 1.0 ml matrix volume without 
overloading with respect any one species. The 

25 overloading of a 1.0 ml column by' 10^^ pfu also 
indicates that the initial column that captures 
indiscriminately adhesive phage should be 5 to 10 times 
as large as the column that supports the target 
material • 

30 

Elution volumes of LG7 and LGIO obtained from 
cells grown on media that is 2.0 mM in IPTG are 
measured at optimal conditions and for a loading of 
10^*^ pfu for various initial ionic strengths: i.O mM^ 
.35 5.0 mM, 10.0 mM, 2 0.0 mM, and 50.0 mM. We may find, 
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for example, that LGIO is slightly retarded fay the 
column when loaded at 1-0 mM KCl, but that LG7 always 
comes off the coltomn at its characteristic place in the 
gradient. We use 10,0 mM as initial ionic strength in 
' 5 all remaining affinity separations. 

To determine the sensitivity of chromatography of 
phage that display variants of BPTX on their surfaces 
(Sec, 10,1), we prepare artificial mixtures of two 

10 closely-related phage that differ only at one residue 
in the BPTI domain- One variety of phage has strong 
affinity for the colxmn used in this step^ while the 
other phage has no affinity for the column- We 
chromatograph these mixtures 'to discover how little of 

15 the phage that binds to the column can be detected 
within a large majority of phage that do not bind the 
column- 

For these tests we choose AHTrp as AfM(BPTI): - A 
20 column having 2 ml bed volume is prepared with 
(DpAMoMoptimal AHTrp)/ (ml of Affi-Gel 10 (™) ) . 

The column is called {AHTrp} and has Vy = 1*0 ml. 

A new phage, LG9, is prepared that displays 
25 BPTI(V15) as IPBD ■ in contrast to LG7 that . displays 
BPTI(K15;. wild- type) as IPBD. Residue 15 of BPTI is 
residue 3 8 of the osp-j-pbd gene- We introduce the 
change K38 to V by replacement of a short segment of 
the osTo-ipbd gene between Apa I & Stu I- The correct 
30 construction is called pLG9 . To expedite 

differentiation between LG7 and an LG9 -derivative 
phage, we replace the amp^ gene of LG9 with the tet^ 
gene from pBR322- DNA from pBR322 between BsmI (13 53, 
blunted) and Aatll (1428) is ligated to dsDNA from pLG9 
35 cut with Xba l (blunted) and Aatll. The correct 
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construction, having 9.2 ]cb, is easily distinguished 
from pBR322 and is called LGll. DNA from phage LGll is 
seq[uencied in the vicinity the junctions of the newly 
inserted tet^ gene to confirm the construction. 

LG7 and. LGll are grown with optimum IPTG (2.0 inM)' 
and harvested. Mixtures are prepared in the ratios 

LG7;LG11 : : l:Viijj^ 

where V±±x[i ^^^^^ from 10^^ to 10^ by factors of 10, 

Large values of V^j^^j^ are ^tested first; once a V^^^j^' is 

found that allows recovery of LG7, smaller values . of 

^lim be tested - 



The column {AHTrp} is first blocked by, treatment 
with. 10^^ virions of M13am429 in 100 ul of 10 luM KCl 
buffered to pH 8.0. with phosphate; the column is washed 
with the same buffer until OD250 ^returns to base line 
20 or 4 X Vy have passed through the column^ whichever 
comes first. One of the mixtures of LG7 and LGll 
containing 10^^ pfu in 1 ml of the same buffer is 
applied to {AHTrp}. The column is eluted in a standard 
way : 



1) 10 mM KCl buffered to pH 8.0 with phosphate, 
until optical density at 280nm falls to base line 
or 4 X Vy, whichever is firsts (discard effluent) , 

2) a gradient. o£ 10 mM to 2 M KCl in 3 x Vy/ pH 
held at 8.0 with phosphate, (30 x- 100 ul 
fractions) , 



35 



3) a gradient of 2 M to 5 M KCl in 3 x Vy, 
phosphate buffer to pH 8.0, (30 x; 100 ul 
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fractions) ^ 

4) constant 5 M KCl plus 0 to 0.8 M guanidiniixm CI 
in 2 X Vy, with phosphate buffer to pH 8-0, (20 x 

5 100 ul fractions)^ 

5) constant 5 M KCl plus 0.8 M guanidinium CI in 
1.2 X Vv/ with phosphate buffer to pH 8.0, (12 x 
100 ul fractions) . . 

10 

Samples of 4 ul from each . fraction are plated at 
suitable dilution on phage-sensitive Sup"^ cells (so 
that M13am429 will not grow) . A sample of the column. 
15 matrix is also used as inoculum for phage-sensitive 
Sup*^ cells. Plaques are transferred to ampicillin- 
containing LB agar, and Amp^ colonies are tested for 
display of BPTI(K15) by use of trp* or AHTrp*. 

20 By hypothesis, '^ixnx ~ ^ largest 

value for which LG7 can be recovered. Thus Cg^j^s^ = 
4 . 0 X 10^ . Three cycles of chromatography are required 
to isolate I^GT, so the first approximation to C^ff is 
740 ( = exp( loge(4.0 x 10^)73 ) ). 

25 

We now determine the efficiency of the affinity 
separation (Sec. 10.2). This is done by: a) preparing 

\ mixtures of LG7 and LGll in the ratio 1:Q, b) enriching 
the population for LG7 for one separation cycle, and c) 

30 determining the fraction of LG7 in the last phage- 
bearing fraction. When Q is 1.5 x 10"*, 3% of colonies 
are BPTI positive. When .Q is 1.5 x 10^, 60% of the 
colonies are BPTI positive. Thus we calculate C^ff = 
. 60 X 1.5 X 10^ - 900. 
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bur liypothetical LG7 should display one or more 
BPTI domains on each virion. The osp-lpbd gene is 
under control of the lacUVS promoter so that expression 
levels of BPTX-M13 CP can be manipulated via [IPTG] ^ 
5 This construct may be used to develop . many different 
binding proteins, all based on BPTI, An optimum level 
of induction and amount of AfM(PBD) (= DoAMoMoptij^um " 
2,0 ing/(ml of support)) should have been determined; 
target molecules will be applied to columns in this 
XO amount in the. process disclosed in Sec. 15.1. These 
optimum levels- may be adequate for all targets and all 
variegations of BPTI displayed on derivatives of M13 
based on LG7, but some further optimization may be 
needed if other values of pH or temperatures are used. ' 

15 

Other ' pbd gene fragments may be- substituted for 
the bpti gene fragment in pLG7 with a high likelihood 
that PBD will appear on the surface of the new LG7 
derivative. 

20 

Example 1 r Part III 

HHMb is chosen as a typical protein target ; an , 
other protein could be used* HHMb satisfies all of the 
25 criteria for a target: 1) it is large enough to be 
applied to an affinity matrix, 2} after attachment it 
is not reactive, and 3) after . attachment there is 
sufficient unaltered surface to allow specific binding 
by PBDs. 

30 

The essential information for HHMb is known: * 1) 
HHMb is stable at least up to 7 0°C, between pH 4.4 and 
9.3, 2) HHMb is stable up to 1.6 M Guanidinium Cl, 3) 
the pl of HHMb is .7.0, 4) for HHMb, M^- = 16,000, 5) 
35 HHMb requires haem, 6) HHMb has no proteolytic 
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activity. 

In addition, the following information about HHMb 
and other myoglobins is available: l) the . sequence of 
5 HHMb, 2) the 3D. structure . of spezrm whale myoglobin- 
(HHMb has 19 amino acid differences and it is generally 
assumed that the .3D structures are almost identical)/ 
3) its lack of ■ enzymatic activity, 4) its lack of 
toxicity » 

10 

We set the specifications of an SBD as : 
1) T = 25^C 
15 2) pH = 8.0 

3) Acceptable solutes : 
A ) for binding : 

i) phosphate, as buffer, 0 to 20 mM, and 

ii) KCl, 10 mM, 
. B ) for column elution : 

i) phosphate^ as buffer 0 to 30 mM, 

ii) KCl, up to 5 M, and 

iii) Guanidiniiim Cl, up to 0*8 M. 

4) Acceptable % < 1.0 x 10"^ M- 

We choose LG7 as GP(IPBD) . 

. Residues to be varied are picked, in part, through 
the use of interactive computer graphics to visualize 
the structures. In this section, all residue numbers 
refer to BPTI. We pick a set of residues that forms a 
surface such that all residues can contact one target 
molecule. Information relevant to choosing BPTI 



20 



25 
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residues to vary includes: 1) the 3D structure, 2): 
solvent accessibility of each residue (LEEB71.) , 3) a 
compilation of sequences of other proteins homologous 
to BI^TX, and 4) knowledge of the structural nature of, 
5 different amino acid types. 

Tables 16 and 3 4 indicate which residues of BPTI:- 
a) have substantial surface exposure, and b) are known 
to tolerate other amino acids in other closely related 

10 proteins. We use interactive computer graphics to pick 
sets of eight 'to twenty residues that are exposed and 
variable and such that all members of one set can touch 
a molecule of the target material at one time. If BPTI 
has a small amino acid at a given residue, that amino 

15 acid may not be able to contact the target 
simultaneously with all the other residues in the 
interaction set, but a larger amino acid might well ^ 
make contact. A charged amino acid , might affect 
binding without making direct contact. In such cases, 

2 0 the residue should be included in the interaction set, 
with a notation that larger residues might be useful. 
In a similar way, large amino acids near the geometric 
center of the interaction set may prevent residues on 
either side of the. large central residue from making 

25 simultaneous contact. If a small amino acid, however, 
were • substituted for the large amino acid, then the. 
surface would become flatter and residues on either 
side could make simultaneous contact. Such a residue 
should be included in the interaction set with a 

30 notation that small amino acids may be useful. 

Table 35 was prepared from standard model parts 
and shows the maximum span between Cj^^ta '^^P 
each type of side group. Cj^eta used because it is 
35 rigidly attached to the protein main-chain; rotation 
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about the C^ipj^g^^-Cj^j^ta i^ond is the most important 
degree of freedom for determining the location of the 
side group. 

5 Table 34 indicates five surfaces that meet the 

given criteria. The first surface comprises the set of 
residues that contacts trypsin in the complex of 
trypsin with BPTI as reported in the Br ooWiaven Protein 
Data Bank entry "ITPA". This set is indicated by the 
10 number "1". The exposed surface of the residues in 
this set (taken from Table 16) totals 1148 and the 
approximates the area of contact between BPTI and 
trypsin. 

15 Other surfaces, numbered 2 to 5, were picked • by 

first picking one exposed|. variable residue and then 
picking neighboring residues until a surface was 
defined. The choice of sets of residues shown in Table 
34 is in no way exhaustive or unique; other sets of 

20 variable, surface residues can be picked. Hereinafter 
we refer to K15 as being at the top of the molecule^ 
while the carboxy and amino termini are at the bottom. 

Solvent accessibilities are useful, easily 
25 tabulated indicators of a residue' s exposure. Solvent 
accessibilities must be used with some caution; small . 
. amino acids are under-represented and large amina acids 
over-represented. The user must consider what the 
solvent accessibility of a different amino acid would 
30 be when stibstituted into the structure of BPTI. 

To create specific binding between a derivative of 
BPTI and HHMb, we will, vary the residues in set #2. 
This set includes the twelve principal residues 17 (R) , 
35 19(1), 21(y), 27 (A), 28(G), 29(L), 31(Q), 32 (T) , 34(V), . 
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48(A), 49(E), and 52 (M) (Sec. 13.1.1)- None of the 
residues in set #2 is completely conserved in . the 
sample of sequences reported in Table 34; thus we can 
vary them vith a high probability of retaining the 
5 underlying structure. Independent substitution at each 
of these twelve residues of the amino acid types 
observed at that residue would produce approximately 
4.4 X 10^ amino acid, sequences and the same number of 
surfaces. 

10 

BPTI is a very basic protein.. This property has 
been used in isolating and purifying BPTI and -its 
homologues so that the high frequency of arginine and 
lysine residues may reflect bias in isolation and is 
15 not necessarily required by the structure. Indeed > 
SCI-III from Bombyx mori contains seven more acidic 
than basic groups (SASA84) . 

Residue 17 is highly ' variable and fully exposed 
20 and can contain A/ H, F, L, T, G, Y, or 

S. All types of amino acids are seen: large, small, 
charged, neutral, and hydrophobic. That no acidic 
groups are observed may be due to bias in the sample. 

.25 Residue 19 is also variable and fully exposed, 

containing P, R, 1, S, K, Q, and L. 

Residue 21. is not very variable, containing F or Y 
in 31 of 33 cases and I and W in the remaining cases. 

3 0 The side group of Y21 fills the space between T32 and 
the main chain of residues 47 and 48. The OH at the 
tip of the Y side group projects into the solvent.. 
Clearly one can vary the surface by substituting Y or F 
so that the surface is either hydrophobic or 

3 5 hydrophilic in that region. It is also possible that 
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the o-tiiisr' aromatic amino acid f viz . h) or the other 
• hydrophobics (L, M, or V) might be tolerated. 

Residue 27 most often contains but L, and 

5 T are also observed. On structxaral grounds, this 
residue will probably tolerate any hydrophilic amino 
acid and perhaps any amino acid. 

Residue 28 is G in BPTI. This residue is in a 
10 turn, but is not in a conformation peculiar to glycine. 
Six other types of amino acids have been observed- at 
this residue: H, Q, R, H, and Small side groups 

at this residue might not contact HHMfa simultaneously 
with residues 17 and 34. Large side groups could 
15 interact with HHMb at the same time as residues 17 and 
34. Charged side groups at this residue could affect 
binding of HHMb on the surface defined by the other 
residues of the principal set. Any amino acid, except 
perhaps P, should be tolerated. 

20 

Residue 2 9 is highly variable, most often 
containing L. This fully exposed position will 
probably tolerate almost any amino acid except^ 
. perhaps, P. 

25 

Residues 31, 32, and 34 are highly variable, 
exposed, and in extended conformations; any amino acid 
should be tolerated. 

3 0 Residues 48 and 49 are also highly variable . and 

fully exposed, any amino acid should be tolerated. 

Residue 52 is in an alpha helix. Any amino acid, 
except perhaps P, might be tolerated. 

35 
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Now We. consider possible variation of the 
secondary set (Sec, 13*1.2) of residues that are in the 
neighborhood of -the principal set. Neighboring 
residues that might be varied at later stages include 
5 /9(P)Vll(T), 15(K), 16(A), 18(1), 20(R), 22(F), 24(N)/ 
26 (K), 35 (Y), 47 (S), 50(D), and S3 (R) . 

Residue - 9 is highly variable, extended, and 
. exposed Residue 9 and residues 48 and 49 are 
10 separated by a bulge caused by the ascending chain from 
. residue 31 to 34. For residue 9 and residues 48 and ^49 
to contribute simultaneously to binding, either the 
target must have a groove into which the chain from 31 
to 34 can fit, or all three residues (9., 48, and 49) 
15 must have large amino acids that effectively reduce the 
radius of curvature of the BPTI derivative. 

Residue 11 is highly variable, extended, and 
exposed. Residue 11, like residue 9, is slightly far 
20 from the surface defined by the principal residues and 
will contribute to binding in the same circtimstances . 

Residue 15 is highly varied. The side group of 
residue 15 points away form the face defined by set #2* 
25 Changes of charge at residue 15 could affect binding on 
the surf ace ' defined by residue set #2. 

Residue 16 is varied but points away from the 
surface defined by the principal set. Changes in 
charge at this residue .could affect binding on the. face 
defined by set #2. 

Residue 18 is I in BPTI. This residue is in an 
extended conformation and is exposed. Five other amino 
35 acids have been observed at this residue: M, F, L, V, 



30 

Vtf 
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and T. Only T is liycirophilic. The side group points 
directly away from the surface defined by residue set 
#2. Substitution of charged amino acids at this 
residue could affect binding at surface defined' by 
5 residue set #2, 

Residue 20 is R in BPTI. This residue is in an 
extended conformation and is exposed* Pour other amino 
acids have been observed at this residue: A, L, and 
10 Q, The side group points directly away from the 
surface defined by residue set #2. Alteration of -the 
charge at this residue could affect binding at surface 
defined by residue set #2- 

Residue 22 is only slightly varied, being Y, or 
H in 30 of 33 cases . Nevertheless , A, and S have 

been observed at this residue • Amino acids such as 
K, I, or Q could be tried here. Alterations at residue 
22 may affect the mobility of residue 21; changes in 
charge at residue 22 could affect binding at the 
surface defined by residue set #2. 

Residue 24 shows some variation, but probably can 
not interact with one molecule of the target 
25 simultaneously with all the residues in the principal 
set. Variation in charge at this residue might have an 
effect on binding at the surface defined by the 
principal set, 

30 Residue 26 is highly varied and exposed. Changes 

in charge may affect binding at the surface defined by 
residue set #2; siobstitutions may affect the mobility ^ 
of residue 27 that is in the principal set. 

35 Residue 35 is most often Y, W has been obse2n;^ed. 
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The side group of 35 is buried, but substitution of F 
-or W could affect the mobility of residue 34. 

• Residue 47 is always T or S in the sequence sample 
used • .The Og^j^g^ probably accepts a hydrogen bond f roin 
the NH of residue 50 in the alpha helix • Nevertheless, 
there is no overwhelming steric reason to preclude 
other amino acid types at this residue. In particular, 
other amino acids the side groups of which can accept . 
hydrogen bonds, viz « N, D, Q, and E, may be acceptable 
here. 

Residue 50 is often an acidic amino acid, but 
other amino acids are possible. 

Residue 53 is often R, but. other amino acids have 
been observed at this residue. Changes of charge may 
affect binding to the amino acids in interaction set 
#2. 

From published models (HUBE77, WL0D84) one can see 
that R39 is on the opposite side of BPTI from the 
surface defined by the residues in set #2. Therefore, 
variation at residue 39 at the same time as variation 
of some residues in set #2 is much less likely to 
improve binding that occurs along surface #2 than is 
variation of the other residues in set #2. 

In addition to the twelve principal residues and 
13 secondary residues, there are two other residues, 
30(C) and 33(F) , involved in surface #2 that we will 
probably not vary, at least not until late in the 
procedure- ' These residues have their side groups 
buried inside BPTI and are conserved. Changing these . 
residues does not qhange the surface nearly so much as 
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does changing residues in the principal set. These, 
buried, conserved residues do^. however, contribute to 
the surface area of surface #2. : The surface of residue 
set #2 is comparable to the area of the trypsin-binding 
5 surface. Principal residues 17, 19, 21, 27, 28, 29, 
31, 32, 34, 48, 49, and 52 have a combined solvent- 
accessible area of 946.9 . Secondary residues 9, 11, 
15, 16, 18, 20, 22, 24, 26, 35, 47, 50, and 53 have 
combined surface of 1041.7 R2* Residues 30 and 33 have 
10 exposed surface totaling 38.2 fi^ * Thus the three 
groups' combined surface is 2 026 • 8 . 

Residue 30 is C in BPTI and is conserved in all 
homologous sequences. It should be noted, however, 
15 that C14/C38 is conserved in all natural sequences, yet 
Maries et al . (MARKS 7) showed that changing both C14 and 
C3 8 to A, A or T,T yields a functional trypsin 
. inhibitor • Thus it is possible that BPTI-lilce 
molecules will fold if C3 0 is replaced. 

20 

Residue 33 is F in BPTI and in all homologous 
sequences • Visual inspection of the BPTI structure 
suggests that substitution of Yr M, H, or L might be 
tolerated. 

25 

Given our hypothetical affinity separation 
sensitivity, ^sensi/ ^® decide to vary six residues 
leaving some margin for errors in the actual base 
composition of variegated bases , To obtain maximal 

30 recognition, we choose residues from the principal set 
that are as far apart as possible. Table 36 shows the- 
distances between the beta carbons of residues in the 
principal and peripheral set. Ri7 and V34 are at one 
end of the principal surface. Residues A27, G2 8, L29, 

35 A48, E49, and M52 are at the other end, about twenty 
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Angstroms away ; . of these, w^e will vary residues 17, 27, 
29, 34, and 48. Residues 28, 49, and 52 will be varied 
at later rounds. 

5 ■ Of the remaining principal residues^ 21 is left to 

lat^r , variations . Among residues 19 , 31, and 3 2, we 
arbitrarily pic3c 19 to vary. 

Unlimited variation of six residues produces 6.4 x 
10 10*7 amino acid sequences. By hypothesis, Cg^j^g^ is 1 
in 4 X 10^. Table 37 shows the programmed variegation 
at the chosen residues.. The parental sequence is 
present as 1 part in 5.5 x 10^., but the least favored 
sequences are present at only 1 part in 4.2 x. 10^. 
15 Among single-amino-acid substitutions from the PPBD^ 
the least favored is P17-I19-A27-L29-V34-A48 and has a 
calculated abundance of 1 part in 1.6 x 10^. Using the 
optimal gfk codon, we can recover the parental sequence 
and all one-amino-acid substitutions to the PPBD if 

2 0 actual nt compositions come within 5% of programmed 

compositions. The number of transf ormants is My^^^ = 
1.0 X 10^ (also by hypothesis), thus we will produce 
most of the programmed sequences. 

25 The residue nxambers above refer to mature BPTI: 

Since Table 25 refers to the pre-M13CP-BPTI protein, 
• all mature BPTI sequence numbers * have been increased by 
the length of the signal sequence, ,23. Thus, we wish 
to vary residues 40, 42, 50, 52, 57, and 71. . A DNA 

3 0 sxabsequence containing all these codons is found 

between the f Apa l) sites at base 191 and the SphI site 
at base 3 09 of the os-p-r>bd gene. Among Apa l, Drall, 
and PssI, Apa l is preferred because it recognizes six 
bases without any ainbiguity and will cut fewer 
3 5 sequences in the vgDNA. Gratuitous restriction sites 
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can be avoided in some cases by use of codon ambiguity: 
changing the codon for gSl from GGC to GGT makes it 
impossible to generate an Apa l site at codons 50, 51, 
and 6—52 . 

5 

Each piece of dsDNA to be synthesized needs six to 
eight bases added at .either end to allow cutting with 
restriction enzymes and is shown in Table 37. The 
first synthetic base (before cutting with Apal and 

10 St?h l) is 184 and the last is 322. There are 142 bases 
to be synthesized. The center of the piece to - the 
synthesized lies between Q54 and V57, The overlap can 
not include varied bases, so we choose bases 245 to 256 
as the overlap that is 12 bases long. Note that the 

15 codon for FS6 has been changed to TTC to increase the 
GC content of the overlap. The amino acids that are 
being varied are marked as X with a plus over them- 
Codons 57 and 71 are synthesized on the sense (bottom) 
strand. The design calls for "qfk" in the antisense 

20 strand, so that the sense strand contains (from 5* to 
3 ^ ) a) equal part C and A f i.e. the complement of k) ^ 
b) (0.40 T, 0.22 A, 0.22 C, and 0.16 G) f i.e. the 
complement of f ) , and c) (0.26 T, 0.26 A, 0.30 C, and 
0 . 18 G) . 

25 

Each residue that is encoded by "qfk" has 21 
possible outcomes, each of the amino acids plus stop. 
. Table 12 gives the distribution of amino acids encoded 
by "qfk", assuming 5% errors. The abundance of the 
3 0 parental sequence is the product of the abundances of R 
xIxAxLxVxA. The abundance of the least- 
favored sequence is 1 in 4.2 x 10^. 

01ig#27 and olig#28 are annealed and extended with 
35 Klenow fragment and all four (nt)TPs. Both the ds 
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synthetic DNA and RP pLG7 DNA are cut with both Apa I 
and Sph I ■ The cut DNA is purified and the appropriate 
pieces ligated (Sbb "Seic. 14.1) and used to transform 
competent PE383. (Sec. 14,2). In order to generate a 
■ 5 sufficient number of transf ormants, we start with 5,0 1 
of cells ^ 

1) culture E, coli in 5.0 1 of LB broth at 37^C 
until cell density reaches 5 x lO^^ to 7 x 10*^ 

10 cells/ial, 

2) chill on ice for 65 minutes^ centrifuge the 
cell suspension at 4000g for 5 minutes at 4^C, 

15 3) discard supernatant; • resuspend 'the cells in 

1667 ml of an ice-cold, sterile solution of 60 
mM CaCl2 / 

4) chill on ice for 15 minutes, and then 
20 centrifuge at 4000g for 5 minutes at 4^C, 

5) resuspend cells in 2 x 400 ml: of ice-cold, 
sterile 6Q mM CaCl2; store cells at 4°C for 24 
hours, 

25 

6) add DNA (100 fig.) in 2 0 ml of litigation or 
TE buffer; mix, inculafe on ice. for minutes, 

7) distribute into 200 ^1 aliguots and heat 
30 shock cells at 42^C for 20 seconds, 

^) add 200 ml LB broth and inciabate at .37^C for 
1 hour,' 



35 



9) add the culture to 2 . 0 1 of LB broth 
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containing ampicillin at 35-100 ug/ml and 
culture overnight at 37^C, 

10) after 6 liours, remove 200 ml and plate 0,5 
5 ml portions with log phase JM 107 on LB agar, 

using the soft-agar overlay technique. Phage 
are prepared from the soft agar, 

11) centrifuge the overnight culture to remove 
10 cells, and pellet phage (MESSS3), 

12) harvest virions by method of Salivar, et 
al. (SAIJ:64) . 



15 



It is important to: a) use all or nearly all the 
vgDUA synthesized in ligation, b) use all or nearly all 
the ligation mixture to transform cells, and c) culture 
all or nearly all the ' transf ozTnants - These measures 
2 0 are directed at maintaining diversity. 

It' is important to collect virions in a way that 
samples all or nearly all the trans formants. Because 
F" cells are used in the transforation, multiple 
25 infections do not pose a problem in the overnight phage 
production, F* cells are used for phage production in 
agar. 

HHMb has a pi of 7.0 and we carry out 
30 chromatography at pH 8*0 so that HHMb is slightly 
negative while BPTI and most of - its mutants are 
positive, HHMb is fixed (Sec* 15.1) to a 2.0 ml column 
on Affi-Gel 10 (™) or Affi-Gel 15 (^M) at' 4.0 mg/ml 
support matrix, the same density that is optimal for a 
35 column supporting trp. 
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To remove variants of BPTI with strong, 
indiscriminate binding for any protein or for the 
support matrix (Sec. 15.2), we pass the variegated 
5 population of virions over a column that suppoirts 
bovine serum albumin (BSA) before loading the. 
'population onto the. {HHMb} column. Affi-Gel loC™) or 
Affi-Gel 15 (™) is used to immobilize BSA at the 
highest level the matrix will support. A 10.0 ml 

10 coliimn is loaded with 5.0 ml of Af f i-Gel-linked-BSA; 
this column, called {BSA}, has Vy = 5.0 ml. The 
variegated population of Virions containing 10^^ pfu in 
1 ml (0.2 X Vv) of 10 itiM KCl, 1 mM phosphate, pH 8.0 
buffer is applied to {BSA}. We wash {BSA} with 4.5 ml 

15 (0.9 X Vv) of 50 mM KCl., 1 mM phosphate, pH 8.0 buffer. 
The wash with 50 mM salt will elute virions that adhere 
slightly to BSA but not virions with strong binding. 
The pooled effluent of the {BSA} column is 5.5 ml of 
approximately 13 mM KCl . 

20 

The column {HHMb} is first blocked by treatment* 
with 10^1 virions of M13(am429) in 100 ul of 10 itiM KCl 
buffered to pH 8.0 with phosphate; the column is washed 
with the same buffer until OD26O ^returns to base line 
25 or 2 X Vy have passed through the column, whichever 

comes first. The pooled effluent from {BSA} is added 
to {HHMb} in 5.5 ml of 13 mM KCl, 1 mM phosphate, pH 
8.0 buffer. The coltmn is eltxted. (Sec. 15.3) in the 
following way: 

30 

1) 10 mM KCl buffered to pH 8.0 with phosphate,' 
until optical density at 280nm falls to base line 
or 2 X Vy, whichever is first, (effluent 
discarded) , 



35 
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■ 2) a gradient: of 10 mM to 2 M KCl in 3 x V^jr pH 
held at -8.0 with phosphate, (3 0 x 100 jUl 
fractions) , ■ ^ 

5 3) a gradient of 2 M to 5 M KCl in 3 x Vy, 

phosphate buffer to pH 8.-0 (30 x 100 /il 
fractions) , 

4) constant 5 M KCl plus 0 to 0.8 M guanidinixim Cl 
10 in 2 X Vy, with phosphate buffer to pH 8.0, (20 x 

100 til fractions), .and 

5) constant 5 M KCl plus 0.8 M guanidinium Cl in 1 
X Vy, with phosphate buffer to pH 8.0, (10 x 100 

15 /il fractions) • 

In addition- to the elution fractions, a sample is 
removed from the column and used as an inoculum for 
phage-sensitive Sup"*" cells (Sec. 15.4). A sample of 4 

20 pi from each fraction is plated on phage-sensitive Sup"^ 
cells.- Fractions that yield too many colonies to count 
are replated at lower dilution. An approximate titre 
of each fraction is calculated. Starting with the last 
fraction and working toward the first fraction that was 

25 titered, we pool fractions until approximately 10^ 

phage are in the pool,, i^e. about 1 part in 1000 of the 
phage applied to the column. This population is 
infected into 3 x 10^^ phage-sensitive PE384 in 3 00 ml 
of IjB broth* The low multiplicity of infection is 

3 0 chosen to reduce the possibility of multiple infection. 
After thirty minutes, viable phage have entered 
recipient cells but have not yet begun to produce new 
phage. Phage-born genes are expressed at this phase, 
and we can add ampicillin that will kill uninfected 

3 5 cells. These cells still carry P-pili and will absorb 
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phage helping to prevent multiple infections • 

If multiple infection should pose a problem that 
cannot be solved by growth at low multiple-of- 
5 infection on F*^- cells, the following procedure .can be 
employed to obviate the problem. Virions obtained from 
the affinity separation are infected into F"^ E*. coli 
- and cultured to amplify the genetic messages (Sec, 

15.5). CCC DNA is obtained either by. harvesting RF DNA 
10 or by in vitro extension of primers annealed to ss 

phage DNA, The CCC DFA is used to transform cells 
at a high ratio of cells to DNA- Individual virions 
obtained in this way should bear proteins encoded only 
by the DNA within. 
15 ' . 

The variegation produces as many as 6.4 x lo"^ 
dif f erent. amino-acid sequences. C^ff is 900. Thus, 
after two separation cycles, the probability of 
isolating a single SBD is less than 0.10; after three 
20 cycles, the probability rises above 0.10. 

The phagemid population is grown and' 
chromatographed three times and then examined for SBDs 
' (Sec- 15.7). In each separation cycle, phage from the 

25 last three fractions that contain viable phage are 
pooled with phage obtained by removing some of the 
support matrix as an inoculum. At each cycle, about 
10^2 phage are loaded onto the coltamn and about 10^ 
phage are cultured for the next separation cycle. 

30 After the third separation cycle, 32 colonies are 

picked from the last fraction that contained viable 
phage; phage from these colonies are denoted SBDl, 
SBD2 , . . . , and SBD32 . 

3 5 Each of the SBDs is cultured and tested for 
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retention on a Pep*Tie coluinn supporting HHMb (Sec» 
15.8). Phage LG7 (SBDll) shows the greatest retention 
on the Pep-Tie {HHMb} colimn^ eluting at 367 mM KCl 
while wtMl3 elutes at 20 mM KCl. SBDll becomes the 
5 parental amino-acid sequence to the second variegation 
cycle. 

The result of this hypothetical experiment is 
shown in Table 38. R40 changed to D, 142 changed to 
10 Q, A50 changed- to L52 remained and A71 changed to 
W. 

The next round of variegation (Sec. 16) is 
illustrated in Table 39^ The residues to be varied are 

15 chosen by: a) choosing some of the residues in the 

principal set that were not varied in the first round 
f viz. residues 42, 44, 51, 54, 55, 72, or 75 of the 
fusion) , and b) choosing some residues in the secondary 
set. Residues 51, 54,. 55, ^nd 72 are varied through - 

20 all twenty amino acids and, unavoidably, stop. Residue 
44 is only varied between Y and F. Some residues in 
the secondary set are varied through a restricted 
range; primarily to allow different charges (+, 0, -) 
to appear- Residue 38 is varied through K, R, E, or G. 

25 Residue 41 is varied through I, V, K, or E. Residue 43 
is varied through R, G, N, K, D, E, T, or A* 

Olig#29 and olig#30 are synthesized, annealed, 
extended and cloned into pIjG7 at the Apa I/ Sph I sites. 

30 The ligation mixture is used to transform 5 .1 of 

competent PE383 cells so that 10^ trans foirmants are 
obtained* A new {HHMb} is constructed using the same 
support matrix as was used in round 1. A sample of - 
10^^ of the harvested LG7 are applied to {HHMb} and 

35 affinity separated. The last 10^ phage off the column 
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and an inoculum are pooled and cultured. The cul-bured: 
phagemids are re-chromatographed for three separation 
cycles. Thirty-two clonal isolates (denoted SBDll-1, 
SBDll-2,,.., SBDll-3 2) are obtained from the effluent 
5 of the third separation cycle and tested for binding on 
a Pep-Tie {HHMb} column. Of this set, SBDll-2 3 shows 
the greatest retention on the Pep-Tie {HHMb.} colxamn,. 
eluting at 692 mM KCl. , 

10 The results of this hypothetical selection is 

shown in Table'40. Residue 38 {K15 of BPTI) changed to : 

E, 41 becomes V, 43 goes to 44 goes to F, 51 goes to 

F, 54 goes to S, 55 goes to A, and 72 goes to Q- 

15 The. sbdll-23 portion of the osp-pbd gene is cloned 

into an expression vector and BPTI {E15, D17, V18 , Q19, 
N20, F21, E27, F28, L29, S31, A32, S34, W71, Q72) is 
expressed in the periplasm. This protein is isolated 
by standard methods and its binding to HHMb is tested. 

20 is found to be 4.5 x 10""^ M. 

A third, round of variation, using SBDll-23 as 
PPBD, is illustrated in Table 41; eight amino acids are 
varied. Those in the principal set, residues 40, 55, 
25 and 57, are varied through all twenty amino acids* 
• Residue 32 is varied through P, Q, T, K, A, or E. 
Residue 34 is varied through T, P, Q, K, A, or E. 
Residue 44 is varied through F, L, Y, C, W, or stop. 
Residue 50 is varied through E, K, or Q- Residue 52 is. 
3 0 varied through L, F, I, M, or V. 

The result of this variation is shown in Table 42. 
The selected SBD is denoted SBDll-23 -5 and elutes from 
a Pep-Tie {HHMb} column at 980 mM KCl. The sbdll-23 -5 
35 segment is cloned into an expression vector and 
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BPTI(E9, Qll, A17, VIS, N20, W21, Q27^ F28 , 

M29, S31, L32^ H34, W7;i, Q72) is produced. Tliis time 
the % is 7.3 X 10"^ M. 

5 This example is hypothetical . It is anticipated 

that more variegation cycles will be needed .to achieve 
dissociation constants of 10"^ M. It is also possible 
that more than three separation cycles will be needed 
in some variegation cycles • Real DNA chemistry and DNA 
10 synthesizers may have larger errors than our 

hypothetical 5%. If S^-^-j-^ > 0.05, then we may not be 
able to vary six residues at once. Variation of 5 . 
residues at once is certainly possible* 
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•Table 2: Preferred Outer-Surface Proteins 



Genetic 
Package 



Preferred 
Outer-Surface 
Protein 



Reason for preference 



M13 



coat protein 
(gpVIII) 



a) exposed amino terminus, 

b) predictable post- 
■ translational 

processing^.. 

c) numerous copies in 
virion. ■ 



gp III 



aY fusion data available > 



PliiX174 



G protein 



a) known to be on virion 
exterior, 

b) small enough that 
the G-ipbd gene can 

replace H gene/ 



E. coli 



LamB 



a) fusion data, available,. 
b^ non-essential. 



B. subtil is 
spores 



Cote 



CotD 



a) no post-translational 
processing, 

b) distinctive sdeguence 
that causes protein to 
localize in spore coat, 

c) non-essential. 



Same as for CotC. 
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Table 7: Atomic radii 
Angstroms 

^carbonyl 1.52 
^amide 1.55 
Other atoms 1.80 



Table 8 

Fraction of DNA molecules having 
n non-parental bases when 
reagents that have fraction 
M of parental nt. 



10 


M 


.9965 


.97716 


.92612 


.8577 


.79433 


. 63096 




fO 


.9000 


.5000 


.1000 


.0100 


.0010 


.000001 




. fl 


.09499 


.35061 


.23 93 


. 04977 


.00777 


.0000175 




f2 


.00485 


.1188 


.2768 


.1197 


.0292 


.000149 




f3 


.00016 


.0259 


.2061 


.1854 


.0705 


.000812 


15 


f4 . 


000004 


.00409 


.1110 


.2077 


.1232 


.003207 




f8 


0. 


2X10""^ 


.00096 


.0336 


.1182 


.080165 




fl6 


0. 


0. 


0. 


5x10""^ 


.00006 


. 027281 


20 


















f23 


0. 


0. 


0. 


0. 


0. 


.0000089 




most 


0 


0 


2 


5 


7 


12 




"most" is 


the 


value 


Of n 


having the 



probability . 



25 
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Table 9: best vgCodon 

Program "Find Optimum vgCodon." 
5 INITIALIZE-MEMORY-OF-ABUNDANCES 

DO ( tl = 0,21 to 0.31 in Steps of 0,01 ) ' 
. DO ( cl == 0.13 to 0*.23 in steps of 0.01 ) 
. . DO ( ai = 0,23 to 0.3 3 in steps of O.Ql ) 
Comment calculate gl from other concentrations 

10 . . , gl X.Q - tl - cl - al 

... IF( gl .ge. 0.15 ) 

» . . . DO ( a2 = 0.37 to 0.50 in steps of 0.01 ) 
. . , . . DO ( c2 = 0.12 to 0.20 in Steps of 0.01 ) 
Comment. Force D+E == R + K 
15 ...... g2 = (gl*a2 ".5*al*a2)/ (cl+0.5Aal) 

Comment Calc t2 from other concentrations.' 

. . . . . . t2 = 1. - a2 - c2 - g2 

IF(g2.gt. O.l.and. t2.gt.0.1) 
CALCULATE-ABUNDANCES 

20 COMPARE-ABUNDANCES-TO-PHEVIOUS-ONES 

...... . . end_IF_block 

end_DO_loop 1 c2 . . 

end_DO_loop i a2 

. . . . .ena_IF_block i if gl big enough 
25 . . . .end_DO_loop ! al 

. . . end_DO_loop i cl 
. .end_DO_loop ! tl 

WRITE the best distribution and the abundances. 
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5 





Table 10: 


Abundances 


obtained 




from optimum vgCodon 


•* • 

Ammo 




Amino 




* 

acid 


Abundance 


acid 


Abundance 


A 


4.80% 


C 


2.86% 


D 


6.00% 


E 


6.00% 


F 


2.86% 


G 


6.60% 


H 


3.60% 


I 


2.86% 


K 


5.20% 


L 


6.82% 


M 


2.86% 


N 


5.20% 


P 


2.88% 


Q 


3.60% 


R 


6.82% 


s 


7.02% mfaa 


T 


4.16% 


V 


6.60% 


W 


2 .86% Ifaa 


Y 


5.20% 


stop 


5.20%. 







ratio = Abun(W)/Abun(S) = 0.4074 



25 



1 


f 1/ratio) 3 


f ratio) 3 


stoD-free 


1 


2.454 


.4074 


.9480 


2 


6.025 


. 1660 


.8987 


3 


14.788 


.0676 


.8520 


4 


36.298 


.0275 


.8077 


5 


89.095 


.0112 


.7657 


6 


218.7 


4.57 X 10"^ 


.7258 


7 


536.8 


1.86 X 10"^ 


.6881 


Ifaa 


— least - favored 


amino-acid 




mfaa 


= most - favored ; 


amino-acid 
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Table 11: Calculate worst codon. 



Program "Find worst vgCodon within Serr of given 
5 distribution." 

INITIALIZE-MEMORY-OF-ABUNDANCES 
Comment 'Serr is % error level. 
READ Serr 

Comment Tli, Cli,Ali,Gli, T2i,C2i; A2i,G2i/ T3i,G3i 
10 Comment are the intended nt-distribution, 
READ Tli, Cli, Ali, Gli 
READ T2i, C2i, A2i, G2i - 
READ T3i^ G3i 
Fdwn - 1,-Serr 
15 Fup - l.+Seirr 

DO ( tl - Tli*Fdwn to Tii*Fup in 7 steps) 
. DO ( cl Cli*Fdwn to Cli*Fup in 7 steps) 
. . DO ( al = Ali*FdWn to Ali^Fup in 7 st^ps) 
gl 1. - tl - cl - al 
20 • • . IF( (gl-Gli)/Gli -It. -Serr) 

Comment gl too far below Gli, push it back 
. gl = Gli*Fdwn 

. factor = (l.-gl)/(tl 4^01 + al) 
. tl = tl*f actor 
25 . * - . cl " cl*factbr 

, al - al*factor 

• . end_IF_jDlock 
IF( (gl-Gli)/Gli -gt. Serr) 

Comment gl too far above Gli;. push it back 
3 0 • . • . gl =*Gli*Fup 

. factor = (l.-gl)/(tl + cl + al) 
. tl = tl*f actor -. 
. cl - cl*factor 

• al = al*factor 
35 ... ..end IF block 
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Table 11, continued. 



. . . DO ( a2 = A2i*Fdwn to A2i*Fup in 7 steps) 
5 Table 11, continued. 

. . • . DO ( c2 = C2i*Fdwn to C2i*Fup in 7 steps) 

..... DO (g2=^G2i*Fdwn to G2i*Fup in 7 steps) 
Conmient Calc t2 from other concentrations.- 
10 ■ t2 = 1. - a2 - c2 - g2 

, . . . . .'IF( (t2-T2i)/T2i .It. -Serr) 
Coitiment t2 too far below T2i, push it back 

. . t2 - T2i*Fdwn 

. factor = Cl.-t2)/(a2 + c2 -h g2) 

15 . a2 ~ a2*f actor 



. . . . . - • c2 - c2* fact or 

....... g2 = g2* factor 

. . end_IF_bloc3c 
...... IF( (t2-T2i)/T2i .gt. Serr) 

20 Comment t2 too far above T2i, push it bac]c 
....... t2 = T2i*FUp 

...... factor " (l.-t2)/.(a2 4- c2 -f g2) 

Table 11, continued. 

25 a2 = a2*factor 

c2 - c2*factor 

g2 == g2*f actor 

end_IF_block 

. . . . . . IF(g2.gt. 0.0 .and. t2^gt.0.0) 



30 . . . . , .. 4 t3 = 0.5*(l.-Serr) 

g3 = 1. - t3 

....... CALCUIATE-ABUNDANCES 

COMPARE-ABUNDANCES -TO-PREVIOUS -ONES 

t3 = 0.5 

35 g3 = 1, - t3 
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Table 11, continued. 

....... CALCULATE-ABUNDANCES 

... . . . . COMPARE-ABUNDANGES-TO-PREVIOUS-ONES 

5 ...... . t3 = 0.5*{1.+Serr) 

«»■•«■• 9"^' 1» *" t3 

CALCULATE-ABUNDANCES 
Table ,11, continued. 

• . . • • COMPARE-ABUNDANCES-TO-PHEVIOUS-ONES 

. . . end_IF_block 

end_DO_loop ! g2 

V . . \ - *end_DO_loop ! c2 
• * . . . end_DO_loop 1 a2 
. . . •end_DO_loop I al 
. . • end_DO_loop 1 cl 
. .end_DO__loop ! tl 

WRITE the WORST distribution and the abundances • 



10 



15 
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Table 12: Abxindances obtained 
using optimum vgCodon assuming 
5% errors 

Amino Amino 

acid Abundance acid Abundance 

A 4.59% C 2.76% 

D 5.45% B 6.02% 

Z 2.49% Ifaa G 6.63% 

H 3.59% I 2.71% 

K 5.73% L 6.71% 

M 3.00% ^ N 5.19% 

P 3-0-2% ' Q 3.97% 

R 7. 68% mfaa S 7.01% 

T 4.37% V • 6.00% 

W 3.05% Y 4.77% 

stop 5.27% 



ratio = Abun(F)/Abun(R) = 0.3248 



i il/ratiolU f ration 3 stop -free 

1- 3.079 .3248 -9473 

- 2 9.481 .1055 .8973 

3 29.193 .03425 .8500 

4 89.888 .01112 .8052 

5 276.78 3.61 X 10"^ .7627 

6 852.22 1.17 X lO"^ .7225 

7 2624.1 3.81 X 10"^ .6844 
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Table 13: 



R # 


1 


2 


3 


4 


5 


6 


7 


8 


-3 






— 


F 


— . . 


. - 


- 


- 


-2 




_ 


— 


Q 


T 


- 


- 


- 


-1 


— 


— 


— 


T 


E 


- ■ 


- 


- 


1 


R 


R 


R 


P 


R 


R 


R 


R 


2 ; 


P 


P 


P 


P 


P 


P 


, P 


P 


3 


D 


D 


D 


D 


D 


D 


D 


D 


4 


F 


F 


F 


L 


F • 


F 


F 


F 


5 


C 


C 


C 


C 


C 


C 


C 


C 


6 


L 


L 


L 


Q 


L - 


L 


, L 


L 


7 


E 


E 


E 


L 


E 


E 


E 


E 


8 


P 


P 


P 


P 


P 


P 


P 


P 


9 


P 


P 


P 


Q 


P 


P 


P 


P 


10 


y 


Y 


Y 


A 


Y 


Y 


Y 


Y 


11 


T 


T 


T 


R 


T 


T 


T 


T 


12 


6 


G 


G 


G 


G 


G 


G 


G 


13 


p 


P 


P 


P 


P 


P 


P 


P 


14 


c 


T 


A 


C 


C 


C 


C 


C 


15 


K 


K 


K 


K 


K 


V 


G 


A 


16 


A 


A 


A 


A 


A 


A 


A 


A 


17 


R 


R 


R 


A 


A 


R 


R 


R 


18 


X 


I 


I 


L 


M 


1 


I" 


I 


19 


I 


I 


I 


L 


I 


I 


I 


I 


20 


R 


R 


R 


R 


R 


R 


R 


R 


21 


Y 


Y 


Y 


Y 


Y 


Y 


Y 


y 


22 


F 


F 


F 


F , 


F 


F 


F 


F 


23 


Y 


Y 


Y 


Y 


Y 


Y 


Y 


Y 


24 


N 


N 


N 


N 


N 


N 


N 


N 


25 


A 


A 


A 


S 


A 


A 


A 


A 


2 6 


K 


K 


K 


T 


K 


K 


K 


,K 


27 


A 


A ■ 


A 


S 


A 


A 


A 


A 


28 


G 


G 


G 


N 


G 


G 


G 


G 


29 


L 


L 


L 


A 


F 


L 


L 


L 


30 


C 


C 


C 


C 


C 


C 


C 


C 


31 


Q 


Q 


Q 


E 


E 


Q 


Q 


Q 


32 


T 


T 


T 


P 


T 


T 


T 


T 


33 


F 


F 


. F 


F 


F 


F 


F 


F 


34 


V 


V 


V 


T 


V 


V 


V 


V 


35 


Y 


Y 


Y 


Y, 


Y 


Y . 


Y 


Y 


36 


6 


G 


. G 


G 


G 


G 


6 


G 


37 


G 


G 


G 


G 


G 


G 


G 


G 


38 


C 


T 


A 


C 


C 


C 


C 


C 


39 


R 


R 


R 


Q 


R 


R 


R 


R 


40 


A 


A 


A 


G 


A 


A 


A 


A 


41 


K 


K 


K 


N 


K 


K 


K 


K 


42 


R 


R 


R 


N 


S 


R 


R 


R 


43 


N 


N 


N 


N 


N 


N 


N 


N 



Hoitiologues 



9 


10 


11 


12 


13 


14 


15 


16 


17 


18 


19 


- 


— 


— 


■— 


— 


— 


— 




Z 






- 


— 


■ — 


Q 


— 


' — 




H 


G 


z 




- 


— 


- — 


P 


— 


, — 




D 


D 


G 




R 


R 


R 


L 


A 


R 


R 


R 


K 


R 


A 


P 


P 


P 


R 


A 


P 


P 


P 


R 


P 


A 


D 


D 


D 


K 


K 


D 


R 


'T 


D 


S 


K 


F 


F 


F 


L 


Y 


F 


F 


F 


I 


F 


Y 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


L 


L 


L 


I 


K 


E 


E 


N 


R 




K 


E 


E 


E 


L 


L 


L 


Ii^ 


L 


li 


L 


L 


P 


P 


P 


H 


P 


P 


P 


P 


P 


"P 


P 


P 


P 


P 


R 


L 


.A 


A 


P 


P 


A 


V 


Y 


Y 


Y 


N 


R 


E 


E 


E 


E 


E 


R 


T 


T 


T 


P 


I 


T 


T 


■ S 


Q 


T 


Y 


G 


G 


G 


G 


G 


G 


G 


G 


G 




G 


P 


P 


P 


R 


P 


L 


L 


R 


P 


P 


P 


C 


■c 


C 


C 


C 


C 


C 


C 


C 


C 


C 


L 


I. 


K 


Y 


K 


K 


K 


R 


K 




K 


A 


A 


A 


Q 


R 


A 


A 


G 


G 


A 


K 


R 


R 


R 


K 


K 


Y 


R 


. H 


R 


S 


K 


I 


I 


I 


I 


I 


I 


I 


I 


L 


I ' 


F 


I 


I 


I 


P 


P 


R 


R 


R 


P 


R 


P. 


R 


R 


R 


A 


S 


S 


S 


R 


R 


Q 


S 


Y 


Y 


Y 


F 


F 


F 


F 


I 


Y 


Y 


F 


F 


F 


F 


Y 


Y 


H 


H 


Y 


F 


Y 


Y 


Y 


Y 


Y 


Y 


Y 


Y 


Y 


Y 


Y 


Y 


Y 


N 


N 


. N 


N 


K 


N 


N 


N 


N 


N 


N 


A 


A 


A 


Q 


W 


L 


R 


L' 


P 


S 


w 


K 


K 


K 


K 


K 


A 


A 


E 


A 


K 


K 


A 


A 


A 


K 


A 


A 


A 


S 


S 


S 


A 


G 


G 


G 


K 


K 


Q 


Q 


N 


R 


G 


-rr 

K 


L 


L 


L 


. Q 


Q 


Q 


Q 


K 


M 


G 


Q 


C 


C 


C 


c 


C 


C 


c 




C- 






Q 


Q 


Q 


E 


L 


li 


I- 


K 


E 


Q 


L 


T 


T 


T 


G 


P 


Q 


E. 


V 


S 


Q 


P 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


V 


V 


V 


T 


D 


I 


, I 


F 


I 


I 


N 


Y 


Y 


Y 


■ W 


Y 


Y 


Y 


Y 


Y 


Y 


Y 


G 


G 


G 


S 


S 


G 


G 


G 


G 


G 


S 


G 


6 


G 


G 


G 


G 


G 


G 


G 


G 


G 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


R 


R 


R 


. G 


G 


G 


G 


G 


K 


R 


G 


A 


A 


A 


G 


G 


G 


G 


G 


G 


G 


G 


K 


K 


K 


N 


N 


N 


N 


N 


N 


N 


N 


R 


R 


R 


S 


A 


A 


A 


A 
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Q 


A 


N 


N 


N 


N 


N 


N 


N 


N 


N 


N 


N 
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Table 13, continued. 



R # 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


13 


14 


15 


16 


17 


18 


19 


44 


N 


N 


N 


N 


N 


N 


N 


N 


N 


N 


N 


R 


R 


R 


R 


N 




R 


R 


45 


F 


F 


F 


F 


F 


F 


F 


F 


F 


P 


F 


F 


F 


. F 


F 


F 


F 


F 


F 


46 


K 


K 


K 


E 


K 


K 


K 


K 


K 


K 


K 


K 


K 




K 


E 




D 


K 


47 


S 


S 


S 


T 


S 


S 


S 


S 


S 


S 


S 


T 


T 


T 


T 


T 


T 


T 


T 


48 


A 


A 


A 


T 


A 


A 


A 


A 


A 


A 


A 


I 


I 


I 


I 


R 


K 


T 


I 


49 


E 


E 


E 


E 


E 


E 


E 


E 


E 


E 


. E 


E 


E 


D 


D 


D 


A 




E 


50 


D 


D 


D 


M 


D 


D 


D 


D 


D 


D 


D 


E 


E 


E 


E 


E 


E 


Q 


E 


51 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


52 


M 


M 


M 


L 


M 


M 


M 


M 


M 


M 


E 


R 


R 


R 


H 


R 


V 


Q 


R 


53 


R 


R 


R 


R 


R 


R 


R 


R 


R 


R 


R 


R 


R 


R 


R 


E 


R 


G 


R 


54 


T 


T 


T 


I 


T 


T 


T 


T 


T 


T 


T 


T 


T 


T 


T 


T 


A 


V 


T 


55 


C 


C 


C 


C 


C- 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


56 
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57 


G 


G 


G 


P 


G 


G 


G 


G 


G 


G 


G 


R 


G 


G" 


G 


G 


P 




G 


58 
59 


A 
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61 
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Q 
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63 
64 








K 
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R # = residue nmaber 

1 BPTI 

2 Engineered BPTI Froin MARKS? 

3 Engineered BPTI From MARKS 7 

4 Bovine Colostrum (DUFT85) 

5 Bovine Serum (DUFT85) 

6 Semisynthetic BPTI, TSCH87 

7 Semisynthetic BPTI,r TSCH87 
S Semisynthetic BPTI, TSCH87 
9 Semisynthetic BPTI, TSCH87 

10 Semisynthetic BPTI, TSCH87 

11 Engineered BPTI, AUER87 

12 Dendroaspis -polylepis polvler>is (Black mamba) venom I 
(DUFT85) 

13 Dendroaspis polvlenis polyle-pis (Black Mamba) venom K 
(DUFT85) 

14 Hemachatus hemachates (Ringhals Cobra) HHV II 
(DUFT85) 

15 Kaia nivea (Cape cobra) NNV II (DUFT85) 

16 Vipera russelli (Russel's viper) RW II (TAKA74) 

17 Red sea turtle egg white (DUFT85) 

18 Snail mucus ( Helix pomania ) ( WAGN7 8 ) 

19 Dendroaspis ancmsticeps (Eastern green mamba) 
C13 SI C3 toxin (DUPT85) 
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Table 13, continued. 



R #. 
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K 
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. D 


L 


P 


4 


L 


A 


F 


F 


F 


F 


D 


S 


A 


D 


D 


F 


D 




5 


C 


c 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


c 


6 


I 


E 


K 


Y 


Y 


N 


E 


Q 


N 


D 


D 


Ij , 


ip 


E 


7 


L 


L 


L 


L 


L 
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L 


L 


L 


K 


K 


E 


s 


Q 


8 


H 


I 


P 


P 


P 


L 


P 


G 


P 


P 


P 


P 


p 


A 


9 


R 


V 


A 


A 


A 


P 


K 


Y 


V 


P 


P 


P 


p . 


FG 


10 


N 


A 


E 


D 


D 


E 


V 


S 


I 


D 


D 


Y 


V 


D 


11 


P 


A 


P 


P 


P 


T 


V 


A 


R 


K 


T 


T 


T 
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12 


G 


G 


G 


G 


G 


G 


G 


G 


G 
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K 


G 


G 


G 


13 


. R 


P 


P 


R 


R 


R 


P 


P 


P 


N 


I 


P 


P 


X 


14 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


c 


15 


Y 


M 


K 


K 


L 


N 


R 


M 


R 






K 


R 


F 


16 


D 


F 


A 


A 


A 


A 


A 


G 


A 


G 


Q 


A 


A 


G 


17 


K 


F 


S 


H 


Y 


L 


R 


M 


F . 


P 


T 


K 


G 


Y 


18 


I 


I 


I 


I 


M 


I 


ip 


T 


I 


V 


V 


M 


F 


M 


19 


P 


S 


P 


P 


P 


P 


p 


S 


Q 


R 


R 


I 


K 


• TT 

K 


20 


A 


A 


A 


R 


R 


A 


R 


R 


L 


A 


A 


R 


R 


L 


21 


F 


F 


F.. 


F 


F 


F 


Y 


Y 


W 


F 


F 


Y 


Y 


\r 

X 


22 




y 


y 


y 


Y 


Y 


Y 


F 


A 


Y 


Y 


F 


N 


C* 

S 


23 


y 


y 


y 


Y 


Y 




Y 


Y 


F 


Y . 


Y 


Y 


Y 
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X 


24 
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s 
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D 
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D 
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N 


N 


vr 


25 


Q 
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S 


G 


A 


m 

T 


P 


A 


m 


Q 


26 


K 


G 


, A 


A 


A 


H 


S 


T 


V 


R 
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TS 

K 


ill 


27 


K 


A 


A 


S 


S 


L 


s 


S 


K 


Ij 


A 


A 


m 

T 


m 
1 


28 


K 


N 


K 


N 


N 


H 


K 


M 


G 


K 


K 


G 


K 


T> 


29 


Q 


K 


K 


K 


K 


K 


R 


A 


K 


T 


R 


F 


Q 




30 


C 


C 


c 


C 


C 


. C 


C 


C 


C 


C 


C 


/—J 

G 


c 


/-I 


31 


E 


y 


Q 


N 


E 


Q 


E 


E 


TT 

V 


K 


TT 

V 




Lj 




32 


R 


P 


L 


K 


K 


K 


K 


T 


Ij 


A 


Q 


m 


P 


Hi 


33 


F 


F 


F 


F 


"CI 

F 


j; 




T? 

r 


i? 








V 

J? 


1. 


34 


D 


T 


H 


I 


I 


N 


I 


Q 


p 


Q 


R 


V 


K 


I 


35 


W 


y 


y 


Y 


Y 


y 


Y 


y 


• y 


y 


Y 


Y 


Y 


. Y 


36 


S 


s 


G 


G 


G 


G 


G 


• G 


G 


R 


G 


G 


G 


6 


37 


G 


G 


G 


G 


G 


G 


6 


G 


G 


G 


G 


G 


G 


G 


38 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


■ C 


39 


G 


R 


K 


P 


R 


G 


G 


M 
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D 


D 


K 


K 


Q 


40 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


A 


G 


G 


41 


N 


N 


N 


N 


N 


N 


N 


N 


N 


D 


D 


K 


N 


N 


42 


S 


A 


A 


A 


A 


A 


A 


G 


G 


H 


• H 


S 


G 


D 


43 


N 


N. 


N 


N 


N 


N 


N 


N 


N 


G 


G 


N 


N 


N 
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Table 13, continued. 











O "5 


OA 


^ Zj 




0*7 
^ / 


^ o 




^ u 


o X. 






A A 


T> 

xC 


15 

K 


rC 


iN 


TJ 

JN 


TST 


iN 


IN 


i\- 


"KT 
IN 


Pi 




xt 
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XV 


A C7 
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r 
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r 


T? 
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r 


T? 
X 


r 


r 


J? 


r 


r 


V 
X 


T? 
i? 








o 


is 


V 

i\ 


IT 
J\ 


xi 


V 


\r 
X 


Jn 




±C 


TT 

J\ 


Q 
O 


4 / 


m 


rn 
X 


m 
X 


X 


m 
X 


rn 
X 


rn 
X 


rn 
X 


G 
b 


rn 
X 


c 


-o 




rn 
X 


4o 


X 


X 


X 


W 


W 


X 


Xj 


T? 
£t 




Si 


rv 
u 


A 


T? 


T 
Xj 


49 




E 




D 


D 


D 


E 


V 

K 




m 
T 


-rr 
II 




Q 


A 


50 


E 


E 


K 


E 


E 


E 


E 


E 


E 


Ij 


L 


D 


D 


E 


51 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


c 


C 


C 


C 


52 


R 


R 


R 


R 


R 


Q 


- E 


L 


R 


R 


R 


M 


L 


E 


53 


R 


R 


H 


Q 


H 


R 


K 


Q 


E 


C 


C 


R 


D 


Q 


54 


T 


rp 


A 


T 


T 


T 


V 


T 


Y 


E 


E 


T 


A 


K 


55 


C 


C 


C 


C 


C 


- C 


c 


c 


C 


C 


C 


C 


C 


C 


56 


I 


V 


V 


G 


V 


A 


G 


R 


G 


L 


E 


G 


S 


I 


57 


G 


V 


G 


A 


A 


A 


V 




V 
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L 


G 


G 


N 
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S 


S 


K 
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P 


Y 


Y 


A 


F 




59 








A 


G 


Y 


S 




G 


P 


R 








60 










I 


6 






D 













20 Dendroaspis ancms-biceps (Eastern Gr^en 
Mamba) C13 S2 C3 toxin (DUPT85) 

21 Dendroaspis iDolvler)ls polylepes (Black 
laamba) B toxin (DXIPTSS) 

22 Dendroast)is polvlepis polvlepes (Black 
Mamba) E toxin (DUETS 5) 

23 Vipera ammodvtes TI toxin (DUFTS5) 

24 Vipera aiimiodytes CTI toxin (DTJFTS5) 

25 Bunaarus f asciatus VIXI B toxin (DUFT85) 

26 Anemonia sulcata (sea anemone) 5 II 
(DUETS 5) 

27 Homo sapiens HI-14 "inactive" domain 
(DUFT85) 

28 Homo sapiens HI-14 "active" domain 
(DUFTS5) 

29 beta bungarotoxin Bl (DUFT85) 

30 beta bungarotoxin B2 (DUFT85) 

31 Bovine spleen TI II (FI0R85) 

32 Tachypleus tridentatus (Horseshoe crab) 
hemocyte inhibitor (NAKA87) 

33 Bombyx mori (silkworm) SCI-III (S AS AS 4) 

Notes : 

a) both, beta bungarotoxins have residue 15 deleted. 

b) B ■ morl has an extra residue between C5 and C14;.we 
have assigned F and G to residue 9. 

c) all natural proteins have C at 5, 14^ 3 0, 38, 50, £c 55, 

d) all homologues have F33 and G37* 

e) extra Cs in bungarotoxins form interchain cystine 
bridges 
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Table 14: Tally of lonizable Groups. 
BPTI homologues. 



J. cten u 1 i- 1 e jt 
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16 
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1 
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A 
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1 
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2 
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3 


3 
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1 


7 


13 


23 


4 


1 


5 


3 


4 


2 


1 
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3 


15 


24 


3 


2 


4 


6 


5 


1 


1 
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5 


17 


25 


1 


2 


5 


3 


3 


1 


1 


1 


5 
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3 


4 


3 


3 


0 


1 


1 


2 


14 


29 


6- 


2 


5 


7 
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2 


1 


1 


4 


22 


30 


6 


2 


6 


7 


4 


2 


1 


1 


5 


23 


31 


2 


3 


5 


4 


4 


0 


1 


1 


4 


16 


32 


3 


3 


5 


5 


4 


0 


1 


1 


4 


- 18 


33 


4 


7 


3 


1 


4 


0 


1 


1 


-7 


17 



Sequences given in Table 10. 

+ is sum of K + + NH - D - E - C02 , approximate charge on 
molecule at pH 7.0 

# is sum of K + H + NH •+ D + E + C02, i.e. number of ionized 
groups at pH 7.0. 
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Table 15: Aitiixio acids observed at each Residue 

BPTI homologues 
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Table 15: continued. 



Number 
Different 



Res. # 




Contents 




4 0 


• 2 


G22 


All 


A 


41 
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N2 0 


Kll D2 


K 


42 


9 


All 


R9 S4 G3 H2 D Q K N 


R 
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Table 16: Exposure in BPTI 



Coordinates taken from 

Brookhaven Protein Data Bank . entry 6PTI . 

HEADER PROTEINASE INHIBITOR (TRYPSIN) 13 -MAY- 8 7 6PTI 
COMPND BOVINE PANCREATIC TRYPSIN INHIBITOR 
COMPND 2 (/BPTI $, CRYSTAL FORM /III$) 

AUTHOR A.WLODAWER ' 

Solvent radius =1.40 % 
Atomic radii given in Table 7 

Areas in Angstroms -squared. 



Residue 


area 


by M/C 


fraction 


at 


all 


fraction 


ARG 


1 


342. 


45 


205 • 09 


0. 


5989 
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.49 


0 


.4453 


PRO 


2 


239. 


12 


32 • oo 


0. 


3875 


47 


.56 


0 
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3 


272. 


39 


JLbo • / / 


0. 


5829 


143 


.23 


0 


.5258 


PHE 


4 
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33 


X37 • o2 


0. 


4427 


43 
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0 


.1388 


CYS 


5 


241. 


06 




0. 
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0 


.23 


0 


. 0010 
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6 
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98 




0. 


5390 
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0 


.4124 


GLU 


7 


291. 


39 


128.91 


0. 


4424 


90 


.39 


0 


.3102 


PRO 


8 


236. 


12 


128.71 


0. 


5451 


99 


.98 


0 


.4234 


PRO 


9 


236. 


09 


109.82 


0. 


4652 


45 


.80 


0 


.1940 


TYR 


10 


330. 


97 


153.63 


0. 


4642 


79 


.49 


0 


.2402 


THR 


11 


249. 


20 


SO . 10 


0. 


3214 


64 


.99 


0 


.2608 


GLY 


12 


184. 


21 


56.75 


0. 


3081 


23 


.05 


0 


.1252 


PRO 


13 


240. 


07 


13 0.25 


0. 


5426 


75 


.27 


0 


.3136 


CYS 


14 


237. 


10 


75.55 


0. 


3186 


53 


.52 


0 


.2257 


LYS 


15 


310. 


77 


200.25 


0. 


6444 


192 


.00 


0 


. 6178 


ALA 


16, 


209. 


41 


66.63 


0, 


3182 . 


45 


.59 


0 


.2177 


ARG 


17 


351. 


09 


243.67 


0. 


6940 


201 


.48 


0 


.5739 


ILE 


18 


277. 


10 


100.51 


0. 


3627 


58 


.95 


0 


.2127 


ILE 


19 


278, 


03 


146.06 


0. 


5254 


96 


.05 


.0 


.3455 


ARG 


20 


339. 


11 


14 A. 65 


0. 


4266 


43 


.81 


0 


.1292 


TYR 


21 


333, 


60 


102.24 


0. 


3065 


69 


.67 


0 


.2089 


PHE 


22 


306. 


08 


70.64 


0. 


2308 


23 


.01 


0 


.0752 


TYR 


23 


338. 


66 


77.05 


0. 


2275 


17 


.34 


0 


.0512 


ASN 


24 


264, 


88 


99.03 


0. 


3739 


38 


.69 


0 


.1461 


ALA 


25 


211, 


15 


85.13 


0. 


4032 


48 


.20 


0 


.2283 


LYS 


26 


313, 


29 


216.14 


0. 


6899 


202 


.84 


0 


.6474 


ALA 


27 


210. 


66 


96.05 


0. 


4560 


54.78 


0 


.2601 


GLY 


28 


186. 


83 


71.52 


0. 


3828 


32 


.09 


0 


.1718 


LEU 


29 


280. 


70 


132.42 


0. 


4718 


93 


.61 


0 


.3335 


CYS 


30 


238. 


15 


57.27 


0. 


2405 


19 


.33 


0 


.0812 


GLN 


31 


301. 


15 


141.80 


0. 


4709 


82 


.64 


0 


.2744 


THR 


32 


251. 


26 


138.17 


0. 


5499 


76 


.47 


0 


.3043 



Total 



Not 
Covered 



Not 
covered 
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Table 16, continued. 



PHE 33 
VAL 3 4 
TYR 35 
GLY 36 
GLY 37 
CYS 3 8 
ARG 39 
ALA 40 
LYS 41 
ARG 42 
ASF 43 
ASN 44 
PHE 45 
LYS 46 
SER 47 
ALA 48 
6LU 49 
ASP 50 
CYS 51 
MET 52 
ARG 53 
THR , 54 
CYS 55 
GLY 56 
GLY 57 
ALA 58 



304.27 
251.56 
332.64 
187.06 
185.28 
234.56 
417.13 
209.53 
314.60 
349.06 
266.47 
269.65 
313.22 
309.83 
224.78 
211.01 
286.62 
299.53 
238.68 
293.05 
356.20 
251.53 
240,40 
184.66 
106.58 
no pos 



59. 
109. 

80. 

11. 

84. 

73. 
304. 

94. 
166. 
232. 

38. 

91. 

69, 
217. 

69. 

82. 
161. 
156. 

24. 

89. 
224. 
116. 

69. 

60. 

49. 
ition 



79 
78 
52 
90 
26 
64 
62 
01 
23 
83 
53 
08 
73 
18 
11 
06 
00 
42 
51 
48 
61 
43 
95 
79 
71 



0. 
0. 
0. 
6. 
0. 
0. 
0. 
0. 
0. 
O. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 



1965 
4364 
2421 
0636 
4548 
3139 
7303 
4487 
5284 
6670 
1446 
3378 
2226 
7010 
3075 
3889 
5617 
5222 
1027 

3 054 
6306 
4629 
2910 
3292 

4 664 



18.91 
42 .36 
15.05 
1.97 
39.17 
26.40 
250.73 
52.95 
108.77 
179 . 59 
5.32 

23 .39 
14.79 

155.73 

24 .80 
31.07 

100.01 
95.96 
0.00 
66.70 
189.75 
51.64 
O.OO 
32.78 
38.28 



0. 
0, 
0 , 
0, 
0. 
0, 
0. 
0. 
0, 
0. 
0. 
0. 
O. 
0. 
0. 
0, 
0. 
0. 
0, 
0. 
0. 
0. 
0. 
0. 
0, 



given in Protein Data 



0622 
1684 
0452 
0105 
2114 
1125 
6011 
2527 
3457 
5145 
0200 
0867 
0472 
5026 
1103 
1473 
3489 
3204 
0000 
2276 
5327 
2053 
0000 
1775 
3592 
Bank 



"Total area" 



"Not covered 
by M/C" 



"Not covered 
at all" 



is the area measured by a rolling sphere 
of radius 1.4 A^ -where only the atoms _ 
witMn the residue are considered. This 
takes account of conf ormatioh. 

is the area measured' by a rolling sphere 
of radius 1.4 A where all main-chain atoms 
are considered, fraction is the. exposed 
area divided by the total area. Surface 
buried by main-chain atoms is more 
definitely covered than is surface covered 
by side group atoms . 

is the area measured by a rolling sphere 
of radius 1.4 A where all atoms of the 
protein are considered. 
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Table 17: Plasmids used in Detailed Example 
Phage Contents 

1^1 M132ttpl8 with Ava II/ Aat XI/Acc X/ Rsr 

Il/Sau I adaptor 
PLG'2 LGl with amp^ and ColEl of pBR322 cloned 

into Aat II/Acc I sites 
pLGrS pLG2 with Acc I site removed 

Pl^4 pLG3 with first part of osr>--pbd gene 

cloned into Rsr Il /Sau I sites, 

Avr XI/Asu II sites created 
pX»G5 pIiG4 with second part of osp-pbd gene 

cloned into Avr II/Asu II sites ^ BssH I 

site created 

Pl^& . pLG5 with third part of osp-pbd gene 

cloned into Asu II/ BssH I sites ^ Bbe I 
site created 

pI»G7 pIiG6 with last part of osp-pbd gene 

cloned into Bbe I / Asu II sites 

PI^S^ pLG7 with disabled os-p-TJbd gene, same 

length DNA. 

Pl^9 pLG7 mutated to display BPTICVlS^pr^j) 

pLGlO pLG8 4- tet^ gene - amp^ gene 

pLGll pLG9 4- tet^ gene - amp- ^ gene 
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Table 25 : Annotated Sequence of Ipbd gene 



5'- c|'gga|ccg| tat|cca1ggc|ttt|aca|ctt |tat| 

I Rsr II I I -35- ■ I , 



[GCT|TCC(GGC|TCG|TAT|AAT|GTG|TGG| 52 



|AAT|TGT|GAGiCGG|ATAjACA|ATTl , .73 

I lac operator |_ 



|CCT|AGG|AGG1CTC|ACT| 88 
Avr III 



I S. D. 



1 HI i 3c j k 1 s I 1 I V I 1 I k I a I s I ■ 
[ 1 I 2 I 3 I 4 ( 5 I 6 .| 7 I 8 1 9 I 10 [ ' . 
|ATG|AAG|AAA|TCT|CTG|GTT|CTT|AAG|GCTjAGC| 118 

I Afl II j Nhe I I 



I V I a,(.v I a I t I 1 I V I p ( m'l 1 | 
I 11 1 12 1 13 } 14 I 15 1 16 1 17 1 18 1 19 1 20 1 
I GTT j GOT I GTC | 6C6 1 ACC ) CTG | GTA | CCG | ATG | CTG | 148 
I Nru I I I Kpn l| 

! s I f I a I r [ p l.d I f ! c I 1 I el 
I 21| 22| 23| 24| 25| 26| 27 | 28| 29] 30| 
|TCT|TTT|GCT|CGTjcCG,|GAT|TTC)TGT|CTC(GAG| 178 
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Table 25 f continued. 
|AccIIl[ " 1 Ava I I 

I Xho I I 



lplply|t|gjplclk|alr| 
1 31] 32] 33| 34 | 35['36| 37 [ 38l 39| 40l 
f CCG I CCA I TAT | ACT | GGG | CCC f TGC | AAA j GCG j CGC | 
I Pf IM I I |BssH II [ 

[ Apa r I 

I Dra II I 

I Pss I I 



I i I i I r 1 y I f 1 y I n f a I k I 
I 41 1 42 1 43 I 44 I 45 1 46 1 47 1 48 1 49 | 
lATC|ATCjCGT|TATlTTClTAC|AAClGCT[2iAA| 235 



208 £ 
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Table 25, continued. 



I a j g I 1 I c I q I t I f I V I y 
1 50 1 . 51 1 52 1 53 1 541 55| 56| 57| 58 
I QCA I GGC I CTG j TGC | CAG | ACC | TTT | GTA j TAG 
I Stu 1 1 I Acc I 



Xca I 



g I g I 

59| 60| 
GGT I GGT I 



268 



I c I r I a I k I r I n I n I f I k I 
I 61 1 62 I 63 I 64 1 65 1 66 1 67 1 68 | 69 1 
! TGC I CGT I GCT J AAG | CGT | AAC | AAC | TTT | AAA | 
I Esp I i 



295 



I s I a I e I d I c I m J r I t I c |. g i 
; 70 I 71 1 72 I 73 [ 74 1- 75 ( 76 | 77 ( 78 { 79 | 
TCG I GCC I GAA | GAT | TGC | ATG | CGT | ACC | TGC | GGT | 



325 



I g I a I a I e j g I d I d I 
! 80 1 81 1 82 1 83 j 84 1 85 1 86 1 
GGC I GCC I GCT | G2^ | GGT | GAT | GAT | 
Bbe I I 



Nar I 



346 



p I a I k I a I a I 
87] 88| 89| 90| 9l| 
CCG I GCC I AAA | GCG | GCC | 
I • Sf i I ( 



361 
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I f I n I s I 1 I g I a I s I a I t I 

Table 25, continued. 
1 92 j 93 1 94 i 95 f 96 [ 97 j 98 { 99|l00| 
[TTTjAAC|TCT|CTG|CAA|GCTlTCT|GCT|ACC| 388 

I Hind 3 I 

I e r y [ i I g I y I a [ I 
I 101 [ 102 1 103 1 104 1 105 j 105 1 107 I 

lGAA|TAT[ATC|GGT|TAC|GCG|TGGj 409 

I Mlu 1 1 

I a 1 m I V I V I V I 
1 108 fl09 1 110 I 111 1 112 I 

jGCC|ATG|GTG|GT6|GTT| 424 
I BstX I [ 
I Nco 1 1 
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Table 25, continued, 

I i I V I g I a I t I i Ig I i | 
1 113 1 114 1 115 1 116 1 117 1 118 1 119 I 120 I 
I ATC I GTT I GGT | GCT | AGC | ATC | GGT | ATC | 



I k I 1.1 f I k I k I f l.t I s I k I a j 
j 12 1 J 122 1 123 j 124 ) 125 | 126 1 127 j 128 1 125 | 13 0 j 
I AAAf CTG I TTT | AAG | AAA j TTT ) ACT | TCG | AAA | GCG| 

I Asu II I 

I s I . I . I . I 

I 131 1 132 1 133 1 134 | 

I TCT I TAA I TAG | TGA | GGT | TAG | GAG | TCT | 

j BstE II [ 

I AAG I COG I GCC I TAA | TGA | GCG | GGC | TTT | TTT | TTT | 
I Trp terminator ; | 

. I GCT j GAG I G -3' 

I Sau T I , 

Note the following enzyme equivalences, 

xma III = lag: i 

Ago ill = BspM II 

Dra II = ECO0109 I 

Asu II = BstB I 

Sau I = Bsu36 I 
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478 



502 



532 



539 
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Table 27: DNA_synttil 

5' j CCG" [ TCC ) GTC ) GGA [ CCG ] a}AT ) CCA [ GGC [ TTT { ACA ) CTT | TAT ] 

[ GCT I TCC j GGC ( TCG [ TAT | AAT | GTG | TGG [ 

I AAT I TGT I GAG [ CGG | ATA| ACAf ATT | 
ollg#4 3'" gt taa 

I CCT[ AGG[ 
gga tec 

- / 3' - olig#3 
I GCC [ GCT [• CCT [ TCG | A AA | GCG | 
egg cga gga age ttt cgc' 

1 TCT-| TAA| TAG] TGA I GGT 1 TAC] GAG t TCT I 
aga att. ate act eca atg gtc aga 

I AAG I CCC I GCC I TAA 1 TGA I GCG 1 GGC [ TTT 1 TTT I TTT 1 
ttc ggg egg att act cgc ccg aaa aaa aaa 



I CCT I GAG [ GCA [ GGT \ GAG | CG 
gga cte cgt eca etc gc - 5' 
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Table 27^ continued. 

"Top" strand ■ 99 
"Bottom" strand 100 

Overlap 23 (14 c/g and 9 a/t) 

Net length 158 



) 
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Table 28: DNA_seg2 

5'- I gca I cca I aag I 
I spacer [ 

I CCT 1 AGG I AGG | CTC [ ACT [ 
I Avr IT] 

1 S. D. I 

|m|lclk|s|l(v|l|]c|a[s| 
I 1 ( 2 I 3 I 4 I 5 I 6 I 7 I 8 I 9 I lOj 
I ATG I AAG I AAA I TCT I CTG j GTT I CTT I AAG I GGT I AGC I 

[ Afl II [ Nhe I [ 



I V ( a I V I a { t I 1 I V I p I m ! 1 1 
I ll| 12[ 13| 14| 15| 16] 17| 18( 19 ( 20\ 
I GTT I GCT I GTC | GCG | ACC | CTG | GTA j CCG [ ATG | CTG [ 
I Nru I I [ Kph I[ 



ls|fla|r|p|d|flc 
[ 21| 22| 23] 24| 25| 26 j 27] 28 
I TCT I TTT I GCT | CGT | CCG \ GAT | TTC | TGT 
I AccIII I 



P|Ply| tlg|pjc|k 
31| 32| 33] 34l 35] 36| 37| 38 
CCG I CCA I TAT | ACT | GGG j CCC | TGC | AAA 
I Pf IM I I 



1 j e 
29 I 30 I 
CTC j GAG I 
Aval 



Xho I 



a I r 
39| 40 
GCG I CGC I 
BssH Hi 
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Table 28, continued. 

[. Apa I I, 
I Dra II I 
[ Pss I I 



I i I. i I r I 
I 41 1 42 1 43 I 
I ate I ate I cgt | 



I t i s I k I 
I 127 I 128 I 129 I 

|ACT|TCG|AAa|gcg|,gct|gcgI - 3' 
Iasu III SDacer I 
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Table 30: DNA_seq3 



I a I r I 
I 39| 40| 

5'- |ccc|tgc|aca|GCG|CGCj 
I spacer | BssH II [ 



1 i I i I r I y I f I y I n I a 1 k I 
I 41 1 42 1 43| 44 1 45 1 46 [ 47 1 48 1 49] " 
I ATC [ ATC I CGT [ TAT | TTC | TAG | AAC | GCT j AAA j . 

1 a 1 g I 1 I c I q I t 1 f I V I y J g 1 g I 
I 50 1 51 j 52 I 53 1 54 f 55 1 56 1 57 ( 58 1 59 [ 60 1 
I GCA I GGC I CTg'I TGC j CAG | ACC | TTT | GTA | TKQ \ GGT | GGT | 
1 Stu I I I Acc I I 

' I Xca I I 

1 c I r I a I k I r I n 1 n 1 f I k I 
I 61 j 62 1 63] 64 [ 65 1 66] 67 ] 68 j 69 { 
I TGC I CGT I GCT | AAG | CGT | AAC | AAC | TTT | AAA | 
I Esp I I 



[ • s 1 a I e I d I c I la I r I t I c I g I 
I 70 1 71 1 72 1 73 I 74 1 75 1 76 1 77 f 78 | 79 [ 
I TCG [ GCC I GAA I GAT [ TGC | ATG | CGT J ACC | TGC | GGT | 
I Xmalll I I Sph I I 



I g I a 1 

I 80| 8l| 



wo 90/02809 



PCT/US89/03731 



233 

Table 3 0, continued. 

I GGC I GCC 1 get I gaa I 

I Bbe I I spacer . ... 

I War I I 



I t I s I k I 
1 127 I 128 1 129 I 
|ttt|acT|TCG|AAa|gcg|tcg|ccg| - 3' 
[Asu II [ . 
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Table 32: DNA_seq[4 



I g I a I a I e I g 1 d I d I 
5' I 80| Slj 82[ 83} 84 | 85| 86| 

I cct I cgc ] cct I GGC I GCC I GCT j GAA 1 GGT ] GAT ( GAT | 
I spacer [ Bbe I .[ 

I Nar I t 



I p 1 a I k I a I a I 
I 87 1 S8[ 89 1 90 1 91 1 
I CCG [ GCC j AAA I GCG I GCC I 
I s£i I j. 

I f I n I s'l 1 I q I a I s I a f t I 
I 92 1 93 I 94 ( 95] 9S| 97 j 98 | 99 ll00| 
j TTT I AAC I TCT I CTG | CAA | GCT | TCT [ GOT | ACC | 

[Hind 3 I 

I e [ y I i I g I y I a I w I 
1 101 1 102 1 103 1 104 1 105 1 106 I 107 j 
[ GAA I TAT [ ATC | 6GT [ TAC | GCG | TGG | 

I Mlu I I 

[ a I m I V I V I V I 
1 108 1 109 1 110 I 111 1 112 I 
I GCC j ATG I GTG I GTG I GTT I 

I BstX I I 

I Nco 1 1 
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• Table 32, continued, 

I i I V I g I a I t i i I g r i I 
- 1 113 1 114 1 115 I 116 I 117 J 118 1,119 1 120 I 
I ATG I GTT j GGT I GCT I ACC I ATC I GGT I ATC I 

I k I 1 I f I k .1 k 1 f I t I S I k I 
1 121 1 122 1 123 I 124 I 125 I 12 6 1 127 1 128 1 129 I 

I AAA I CTG I TTT | AAG | AAA| TTT | ACT | TCG | AAa | gcg | teg] ggc | - 3 ' 

[Asu II[ Spacer' ) 
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Table 34: Some interaction sets in BPTX . 

Nuinber 
Res. Diff . . . 





AAs 


Contents 


BPTI 


■ 1 


2 


3 


4 


5. 


—5 


2 


D -: 


32 














—4 


2 


E 


32 














—3 


5 


T P 


F Z -29 














—2 


10 


Z3 R3 Q2 T2 H G L K E -18 














— 1 


10 


D4 T2 P2 Q2 E G N K R -18 














1 


10 


R21 


A2 K2 H2 P L I T G D 












5' 


2 


9 


P20 


R4 A2 H2 N E V F L 


P 








S 




3 


10 


D15 


K6 T3 R2 P2 S Y G A L 


D 








4 


& 


4 


7 


F19 


D4 L3 Y2 12 A2 S 


F 








s 


5 


5 


1 


C33 




C 












6 


10 


ill 


E5 NA K3 Q2 12 Y2 D2 T R 


L 








4 




7 


5 


L1.8 


Ell K2 S Q 


E 






s 


4 




8 


7 


P26 


H2 A2 I L G F 


P 






3 


4 




9 


9 


P17 


A6 V3 R2 Q L K Y F 


P 




s 


3 


4 




10 


10 


Yll 


E7 D4 A2 N2 R2 V2 S I D 


Y 


s 




s 


4: 




11 


10 


T17 


P5 A3 R2 I S Q Y V K 


T 


1 


s 


3 


4= 




12 


2 


G32 


K 


G 


X 




X 


5C 





r 
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Table 34, continued. 



13 


5 


P22 


R6 L3 N I 


P 


1 


s 


4 S 


14 


3 


C31 


T A • 


C 


1 


s 


S 5 


15 


12 


kl5 


R4 Y2 M2 L2 -2 V G A I N F 


K 


1 S 


3 


4 S 


16 


7 


A22 


G5 Q2 R K D F 


A 


1 S 


s 


s. 5 


17 


12 


R12 


K5 A2 Y3 H2 S2 F2 L M T G P 


R 


1 2 


3 


s 


IS 


6 


121 


M4 F3 L2 V2 T 


I 


Is 


s 


5 


-L. ^ 


7 


111 


PIG R6 S2 K2 L Q 


I. 


1 2 


3 


s 


20 . 


5 


R19 


A7 S4 L2 Q 


R 


s s 


s 


5 




4 


Y18 


F13 W I 


Y 


2 


s 


s s 




6 


F14 


Y14 H2 A N S 


F 


' s 


3 


4. 




2 


V32 


F ' 


Y 




s 


s 






N26 


K3 D3 S 


N 


s 


3 




25 


10 


A12 


S5 Q3 P3 W3 L2 T2 K G R 


A 


s 


s 






9 


K16 


A6 T2 E2 ,S2 R2 G H V 


K 


s 


3 


4 


27 


5 


A18 


S8 K3 L2 T2 


A 


2 


3 


4 


28 


7 


G13 


KIO N5 Q2 R H M 


G 


2 


s 


s 


29 


10 


L9 Q7 K7 A2 F2 R2 M G T N 


L 


2 


3 




30 


1 


C33 




C 


X X 


: X 




31 


7 


Q12 


Ell L4 K2 V2 Y N 


Q 


2 3 


4 




32 


11 


T12 


P5 K4 Q3 E2 L2 G V S R A 


T 


2 3 


s 




33 


1 


. F33 




F 


XXX 


; X 




34 


11 


Vll 


IB T3 D2 N2 Q2 F H P R K 


V 


12 3 


s 




35 


2. 


Y31 


W2 


Y 


s s s 




5 


36 


3 


G27 


S5 R 


G ' 


1 






37 


1 


G33 




G 


X 




X 


38 


3 


C31 


T. A 


C 


1 


s 


5 


39 


7 


R13 


G9 K4 Q3 D2 P M 


R 


1 


4 


s 
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Table 34/ continued. 



40 


2 


G22 


All 


A 


s 


s 


5 


41 


3 


N20 


Kll D2 


K 




4 


s 


42 


9 


All 


R9 S4 G3 H2 D Q K N 


. R 




s 


5 


43 . 


2 


N31 


G2 


• N 






s 


44 


3 


N21 


Rll K 


N. 






s 


45 


2 


F32 


Y 


P 






s 


46 


8 


K24 


E2 S2 D H V Y R 


K 






5 


47 


2 


T19 


S14 


S 


s 




5 


48 


9 


All 


19 E4 T2 W2 L2 R K D' 


A 


2 S 




s 


49 


7 


E19 


D6 A2 Q2 K2 T H 


E 


2 




s 


50 


6 


E16 


D12 L2 M Q K 


D 


s 




5- 


51 


1 


C33 




C 


X 




X 


52 


7 


R13 


MIO L3 E3 Q2 H V 


M 


2 




s 


53 


8 


R21 


Q3 E2 H2 C2 G K D 


R 


S ' 




5 


54 


7 


T23 


A3 V2 E2 I Y K 


T 






5 


55 


1 


C33 




C 






X 


56 


8 


G15 


V8 13 E2 R2 A L S 


G 








57 


8 


G19 


V4 A3 P2 -2 R L N 


G 








58 


8 


All 


-10 P3 K3 S2 Y2 R F 


' A 








59 


9 


-24 


G2QEAYSPR 










60 


6 


-28 


Q R I G D 










61 


3 


-31 


TP 










62 


2 


-32 












63 


^ 2 


-32 


K 










64 


2 


-32 


S 











s indicates secondary set 

X indicates in or close to surface but buried and/or highly 
conserved. 
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Table 35: 

Distances from Gt,eta "^^ 
Tip of Side Group 
in Angstroms 



Amino Acid type Distance 
A ' ■ 0.0 

C (reduced) ■ 

D ■ ■ 2.4 

■ ' E . . ' ■ - • • ^ . 

F 

G. , ■ • . ■ 
H . 

K : 

-■ L ■ 

■ . -N , -.. ■ ■ ■ 

^ 
R 

■S 
V 

w 



4.3 

4.0 
2.5 
5.1 
2.6 
3.8 
2.4 
2.4 
3.5 
6.0 
1.5 
1.5 
1.5 
5.3 
5.7 



Notes: These distances were calculated for standard model parts 
with all side groups fully extended. 
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Table 36: Distances, BPTI residue set #2 
Distances in Angstroms between Cj^^ta^* 
Hypotlietical Cj^^^a added to each. Glycine. 





R17 


119 


¥21 


A27 


G28 


L29 


Q31 


T32 


V34 


A48 






































Y21 


15.1 


8. 


4 






























A27 


22.6 


17. 


1 


12. 


2 


























G28 


2 6.6 


20. 


4 


13. 


8 


5. 


3 






















L29 


22.5 


15. 


8 


9. 


6 


5. 


1 


5. 


2 


















Q31 


16.1 


10. 


4 


6. 


8 


6. 


8 


10. 


6 


6. 


8 














T32 


11.7 


5. 


2 


6. 


1 


12. 


0 


15. 


5 


10. 


9 


5.4 












V34 


5.6 


6. 


5 


11. 


6 


17. 


6 


21. 


7 


18. 


0 


11.4 


8 


.2 








A48 


18.5 


11. 


0 


5. 


4 


12- 


6 


13. 


3 


8. 


4 


8.8 


8 


.3 


15.7 






E49 


22.0 


14. 


7 


8. 


9 


16. 


9 


16. 


1 


12. 


2 


13 . 9 


13 


.3 


19.8 




5 


M52 


23.6 


16. 


3 


8. 


6 


12. 


2 


10. 


3 


7. 


6 


11.3 


13 


.2 


20.0 


6. 


2 


P9 


14.0 


11. 


3 


9. 


0 


12. 


2 


15. 


4 


13. 


3 


7.9 


9 


.2 


8.7 


13. 


9 


Til 


9.5 


11. 


2 


13 . 


5 


18. 


8 


22 . 


5 


19. 


8 


13.5 


12 


.1 


5.7 


18. 


5 


K15 


7.9 


14. 


6 


20. 


1 


27. 


4 


31. 


3 


27. 


9 


21.4 


18 


.1 


10.3 


24. 


6 


A16 


5.5 


10. 


1 


15. 


9 


25. 


2 


28. 


5 


24. 


6 


18. 6 


14 


.5 


8.6 


19. 


8 


118 


6.1 


6. 


0 


11. 


2 


21. 


3 


24. 


4 


20- 


2 


14.7 


10 


.4 


7.0 


15. 


0 


R20 


10.6 


5. 


9 


5. 


4 


16. 


0 


18. 


5 


14. 


6 


9.8 


6 


.9 


7.8 


10. 


2 


F22 


15-6 


10. 


9 


5. 


6 


10. 


5 


12. 


8 


10. 


3 


6 . 2 


8 


.1 


10 . 8 


10. 


3 


N24 


19.9 


14. 


7 


9- 


4 


4. 


1 


7. 


3 


6. 


1 


4 -. 8 


10 


.0 


14.7 


11.4 


K26 


24.4 


20. 


1 


15. 


2 


5. 


4 


7. 


7 


9- 


8 


10.1 


15 


.3 


19.0 


17. 


0 


C3 0 


IS. 9 


12. 


1 


4. 


6 


8. 


8 


9. 


5 


5. 


3 


5.9 


8 


.2 


14.9 


4. 


9 


F33 


10.8 


7. 


4 


7. 


7 


12. 


6 


16. 


4 


13. 


0 


6.5 


5 


.6 


5.5 


12. 


2 


Y35 


8.4 


7. 


4 


9. 


4 


18. 


4 


21. 


4 


17. 


9 


12.2 


9 


.5 


5.8 


14. 


4 


S47 


17 .6 


10. 


6 


6. 


6 


17. 


3 


17. 


9 


13. 


4 


12 . 6 


10 


.4 


15.9 


5. 


3 


D50 


2 0.0 


13. 


6 


7. 


2 


17. 


2 


16. 


8 


13. 


5 


13.5 


12 


.9 


17.6 


7. 


6 


C51 


18.9 


12. 


2 


4. 


0 


12. 


1 


12. 


2 


8. 


8 


8.8 


9 


,7 


15 . 3 


5. 


4 


R53 


25.4 


18. 


6 


11. 


0 


17. 


2 


15, 


0 


13. 


0 


15.7 


16 


.7 


22.3 


9> 


7 


R3 9 


15.4 


16. 


9 


17. 


1 


24. 


9 


27. 


2 


24. 


9 


20.1 


18 


.7 


13.8 


22. 


3 
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Table 36, continued. 



Distances in Angstroms between Cjaeta^; 
Hypothetical C^eta Vas added to each Glycine. 





E49 


M52 


P9 


Til 


K15 


A16 


118 


R2 0 


F22 


N24 


M52 


6.1 




















P9 


17.7 


15.5 


















Til 


22.1 


21,5 


7-2 
















K15 


.27.5 


28.7 


16.4 


9.5 














A16. 


22.2 


24.2 


14.9 


9.8 


6.2 












118 


17.4 


19.5 


,12.2 


9.5 


10.4 


4,9 










R2 0 


13.0 


13 . 8 


8.0 


-9,4 


14.9 


10,6 


6.2 








F22 


13.8, 


11.4 


4.1 


10.6 


19.1 


16 , 3 


12 .7 


6.9 






N24 


as. 6 


11.2 


8.4 


15.3 


24.1 


21.9 


18 ,2 


i2.7 


6.6 




K26 


20.9 


15.7 


12.1 


18.6 


27.9 


2 6,6 


23 . 3 


18 . 1 


11. 6 


5.9 


C3 0 


8.7 


5.6- 


10,6 


16.6 


24.1 


20.2 


15,7 


9.8 


6.8 


6.9 


F33 


16.5 


15.4 


4.2 


7.1 


15.0 


12.8 


9,6 


6.1 


5.6 


9.3 


Y35 


17.2 


17 . 8 


7.8 


5.8 


11.0 


7.6 


4.9 


4.3 


8 .8 


14.8 


S47 


4.7 


9 , 1 


15.3 


18.5 


23.1 


17. 6 


12 . 8 


9 . 1 


12 . 0 


15 . 3 


D50 


5.5 


7.7 


14.7 


18.6 


24^2 


19.2 


14 , 7 


9 . 9 


11 . 0 


14 . 7 


C51 


7.1 


5.4 


11.0 


16.4 


23 . 5 


19.2 


14 . 6 


8 . 7 


6.9 


9 . 6 


R53 


6.3 


5.6 


17.9 


23.1 


29. 6 


24. 8 


20 , 3 


15 . 0 


13 . 8 


15 . 5 


R39 


23.9 


24.0 


13.0 


9 . 5 


12 . 0 


11.8 


12 . 5 


12 . 8 


14 • 7 


2 0.8 




K26 


C3 0 


r33 


Y35 


S47 


D50 


C51 


R53 






C30 


12.4 




















F33 


13.9 


10.1 


















y35 


19.5 


13.5 


6.4 
















S47 


21.0 


8.8 


13.5 


13 .2 














D50 


20.1 


8.6 


14.3 


13.7 


5.0 












C51 


15.0 


3.7 


10.9 


12.5 


6.9 


5.2 










R53 


19.9 


9.9 


18.2 


18.8 


9.4 


5.8 


7.4 








R39 


24.3 


20. 6 


14.4 . 


9.6 


20.4 


19.0 


18 .8 


23.4 
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Table 37: vgDNA to vary BPTI set #2.1 

4- 





5'- 


CAC [ CCT 


g 

35 
GGG 


P 
36 
CCC 


c 

37 
TGC 


k 
38 
AAA 


a 

39 
GGG 


X 
40 
qfk 


208 






SDacer 


Ana I 












• 

41 


+ 
X 
42 
qfk 


r 
43 
CGT 


y 

44 
TAT 


f 

45 
TTC 


y 

46 
TAG 


n 
47 
AAC 


a 
48 
GGT 


k 
49 

A2yi 




235 



/ 



X 
50 
afk 


g 

51 
GGt 


X 
52 
afk 


c 
53 
TGC 


q - 

54 
CAG 


t 
55 
ACC 


f 

56 
TTc 


olig: 


^28= 3 ' - acg gtc tgg aag 



3' = 

+ 
X 
57 



olig#27 72 nts 



qfk 



y 

58 
TAG 



g 

59 
GGT 



g 

60 
GGT 



268 



78 nts 



Overlap = 12 (7 CG, 5 AT) 



c 


r 


a 


k 


r 


11 


n 


f 


k 


61 


62 


63 


64 


65 


66 


67 


68 


69 


TGC 


CGT 


GGT 


AAG 


CGT 


AAC 


AAC 


TTT 


AAA 



295 



acg. gca cga ttc 
I Esp I 



I 



ca ttg ttg aaa ttt 



s 


X 


e 


d 


c 


in 


70 


71 


72 


73 


74 


75 


TCT 


qfk 


GAG 


GAT 


TGC 


ATG 



age etc eta acg tac gca ccc acc -5' 

[ SpIi 1 1 spacer [ 

Ic " equal parts of T and G; m - equal parts of C and A; 

q (,26 T, .18 C, .26 A, and .30 G) ; 

f = (.22 T, .16 .40 A, and .22 G) ; 

* " coiuplement of symbol above 

Residue 40 42 50 52 57 71 

Possibilities 21 x 21 x 21 x 21 x 21 x 21 ^ 8.6 x 10*^ 
Abundance x 10: 

of PPBD .768 .271 .459 .671 .600 .459 

Produce - 1.77 x 10"^ 

Parent = 1/(5.5 x 10"^) least favored - 1/(4.2 x 10^) 
Least favored one-amino-acid substitution from PPBD present 
at 1 in 1.6 X 10^ 
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Table 38: Result of varying set#2 of BPTI 2.1 



1 


e- 


29 


3 0 


CTC 


GAG 


Ava I 


Xho I 



P 


P 


y 


t 


g 


P 


C 


k 


a 


D 


31 


32 


33 


34 


35 


36 


37 


38 


39 


40 


CCG 


CCA 


TAT 


ACT 


GGG 


CCC 


TGC 


AAA 


GCG 


GAT 



pf iM i I 

I Apa I I 
Dra II . 



Pss I 



i 


Q 


r 


y 


f 


y 


n 


a 


k 




41 


42 


43 


44 


45 


46 


47 


48 


49 




ATC 


CAG 


CGT 


TAT 


TTC 


TAG 


AAC 


GCT 


AAA 




E ' 


g 


L 


C 


q 


t 


f 


S 


y 


g 


50 


51 


52 


53 


54 


55 


56 


57 


58 


59 


GAG 


GGC 


CTG 


TGC 


CAG 


ACC 


TTT 


TCG 


TAG 


GGT 


c 


r 


a 


k 


r 


il 


. n 


f 


k 




61 


62 


63 


64 


65 


66 


67 


68 


69 




TGC 


CGT 


GCT 


AAG 


CGT 


AAC 


AAC 


TTT 


AAA 








Eso I. 














s 


W 


e 


d 


c 


m 


r 


t 


C 


g 


70 


71 


72 


73 


74 


75 


76 


77 


78 


79 


TCG 


TGG 


GAA 


GAT 


TGC 


ATG 


CGT 


ACC 


TGC 


GGT 










1 SDh 1 


:| 









235 



g 

60 
GGT 



268 



295 



325 



. g 


a 


■8 0 


81 


GGC 


GCC 


Bbe I 


Nar I 
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Table 39: vgDNA to vary set#2 BPTI 2.2 

+ 



ccr aca cac 


35 
GGG 


P 
36 

CCC 


c 
37 
TGC 


X 
38 
mrA 


a 
39 
GCG 


D 
40 
GAT 


spacer 


Arte 


i I 









X 


Q 


X 


X 


f 


y 


n 


a 


k 


41 


42 


43 


44 


45 


46 


47 


48 


49 


rwA 


CAG 


rv3c 


TwT 


TTC 


TAC 


AAC 


GCT 


AAA 



208 



235 



E 


X 


L 


C 


X 


X 


■ f 


S 


Y 


g 


g 




50 


51 


52 


53 


54 


55 


56 


57 


58 


59 


60 




GAG 


qfk 


CTG 


TGC 


qfk 


*qfk 


TTT 


TCG 


TAC 


G6T 


GGT 


268 



91 nts olig#30 3'- g cca cca 
Overlap = 15 (11 CG, 4 AT) 

/" 3' olig#29 94 nts 



c 


r 


a 


k 


r 


n 


n 


f 


k 


61 


62 


63 


64 


65 


66 


67 


68 


69 


TGC 


CGT 


GCT 


AAG 


CGT 


AAC 


AAC 


TTT 


AAA 



295 



acg gca cga ttc gca ttg ttg aaa ttt 
I ESP I I 



s 


W 


X 


d 


C 


m 


70 


71 


72 


73 


74 


75 


TCG 


TGG 


qfk 


GAT 


TGC 


ATG 



k 
in 
w 

f 
* 



age acc **ia eta acg tac gcg acc tgc -5' 

I Sph 1 1 spacer | 

equal parts of T and G; v = equal parts of C, A, and G; 
equal parts of C and A; r == equal parts of A and G; 
eqrual parts of A and T; 
(.26 T, .18 C, ,26 A, and ,30 G); 
{•22 .16 C, .40 A^ and .22 G) ; 

complement of symbol above 



Residue 
Possibilities 



38 41 43 44 51 54 55 72 
4 X 4 X 9 X 2 X 21 X 21 X 21 X 21 

= 6.2 X lo'^ 

Abundance x 10 2.5 2. 5 . 833 5. ,663 .397 .437 .602 
Product = 2 . 3 X 10"^ 

Parent = 1/(4.4 x lO^) least favored = 1/(1.25 x 10^) 
Least favored one-amino-acid substitution from PPBD present 
at 1 in 1.2 X 10*7 
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Table 40: Result of varying set#2 of BPTI 2.2 



P 
31 
CCG 



V 
41 
GTT 



E 
50 
GAG 



C 

61 
TGC 



P 
32 
CCA 



y 

33 
TAT 



t 
34 
ACT 



Ff IM I 



Q 
42 
CAG 



F 

51 
TTT 



r 
62 
CGT 



N 
43 
AAT 



L 
52 
CTG 



a 
63 
GCT 



F 
44 
TTT 



C 
53 
TGC 



k 
64 
AAG 



Esp I 



g 

35 
GGG 



P 
36 
CCC 



Apa I 



f 

.45 
TTC 



S 
54 
TCT 



r 
65 
CGT 



y 

46 
TAC 



A 
55 
GCT 



n 

66 
AAC 



C 

37 
TGC 



s 


W 


Q 


d 


c 


70 


71 


72 


73 


74 


TCG 


TGG 


CAG 


GAT 


TGC 



m 
75 
AT6 
Sph I 



n 

47 
AAC 



f 

56 
TTT 



11 
. 67 
AAC 



r 
76 
CGT 





1 


e 




29 


30 




CTC 


GAG 




Xho- I 


E 


a 


D 


O O 


39 


40 


GAG 


GCG 


GAT 


a 


k 




48 


49 




GCT 


AAA 




S 


y 


g 


57 


58 


59 


TCG 


TAG 


GGT 


f 


k 




68 


69 




TTT 


AAA 




t 


C 


g 


77 


78 


79 




TGC 


GGT 



178 



208 



g 

60 
GGT 



235 



268 



295 



325 



g 


a 


80 


81 


GGC 


GCC 


Bbe I 


Nar I 
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Table 41: vg DNA set#2 of BPTI 2»3 



ca aac eta 


1 

29 
CTC 


e 
30 
GAG 


1 soacer 


Xhc 


> I 





+ 
















+ 


P 


X 


y 


X 




P 


C 


E 


a 


X 


31 


32 


33 


34 


35 


36 


37 


38 


39 


40 


CCG 


vmcr 


TAT 


VHICT 


GGG 


CCC 


T6C 


GAG 


GCG 





178 



2 08 



V 
■ 41 
GTT 


Q 
42 
CAG 


N 
43 
AAT 


X 
44 
Tdk 


f 

45 
TTC 


- y 

46 
TAG 


n 
47 
AAC 


a 
48 
GCc 


k 
49 
AAa 


-3' 


67 n1 


;s o" 


Lig#34 3'- gatg 


ttg 


egg 


tfcc 





Overlap = 13 (7 CG, 6 AT) 



-3' olig#33 71 nts 



+ 




+ 






+ 




+ 








X 


F 


X 


C 


S 


X 


■ f 


X 


y 


g 


g 


50 


51 


52 


53 


54 


55 


56 


57 


58 


59 


60 


VAG 


TTT 


nTk 


TGC 


TCT 


qfk 


TTT 


qfk 


TAC 


GGT 


GGT 


btc 


aaa 


nam 


acg 


aga 


**m 


aaa 


**m 


atg 


cca 


cca 


c 


r 


a 


k 
















61 


62 


63 


64 
















TGC 


CGT 


GCT 


AAG 


C 















268 



k 
w 

a 

g 
f 
* 



acg gca cga ttc gcg acc ggc 
I Esp I I spacer | 

= equal parts of T and 6; m = equal parts of C and A; 
= equal parts of. A and T; n = equal parts of A,C,G,T7 
= equal parts A,G,T; v = equal parts A,C,G; 

= (,26 T, .18 Cr .26 A, and .30 G) ; 
== (.22 T, .16 C, .40 A, and .22 G) ; 
= complement of symbol above 



Residue 32 34 40 44 50 52 55 57 

Possibilities 6 x 6 x 21 x 6 x 3 x 5 x 21 x 21 = 

3 X lo7 

Abundance x 10 

of PPBD 10/6 10/6 .545 10/6 10/3 30/8 .459 .701 

product .=. 1. 01 X 10"' 

parent = l/(l x 10^) least favored = 1/(4 x 10^) 

Least favored one-amino-acid substitution from PPBD present 

at 1 in 3 X 10' - 
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Table 42: Result of . varying set#2 of BPTI .2.3 



i 


e 


29 


30 


CTC 


GAG 


Ava I 


Xho I 



178 





E . 


31 


32 


CCG 


GAG 


V 


o 


Tt nib 


42 


GTT 


GAG 


Q 


F 


50 


51 


GAG 


TTT 


c 


r 


61 


62 


TGC 


CGT 


s 


W 


70 


71 


TCG 


TGG 


g 


a 


80 


81 


GGC 


GCC 


Bbe i 


Nar I 



y 

33 
TAT 



N 
43 
AAT 



M 
52 
ATG 



a 
63 
GCT 
Esp I 



Q 


g 


P 


c 


E 


a 


34 


35 


36 


37 


38 


39 


CAG 


GGG 


CCC 


TGC 


GAG 


GCG 




AT>a I 








W 


f 


y 


n 


a 


k 


44 


45- 


46 


47 


48 


49 


TGG 


TTC 


TAC 


AAC 


GCT 


AAA 


c 


S 


L 


f 


H 


y 


53 


54 


55 


56 


57 


58 


TGC 


TCT 


CTT 


TTT 


CAT 


TAC 


k 


r 


n 


n 


f 


k 


64 


65 


66 


67 


68 


69 


AAG 


CGT. 


AAC 


AAC 


TTT 


AAA 



A 
40 
GCT 



g 

59 
GGT 



g 

60 
GGT 



208 



235 



268 



295 



Q 
72 
CAG 



d 
73 
GAT 



C 
74 
TGC 



Itt 
75 
ATG 

Sph : 



r 
76 
CGT 



t 


C 


g 


77 


78 


79 


ACC 


TGC 


GGT 



325 
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CliAIMS 

1- A method of obtaining a protein that iDinds a 
predetermined target that comprises: 

5 

a) preparing a. variegated population of repliqable 
genetic packages, each package including a nucleic 
acid construct coding on expression for an outer- 
surface-displayed potential binding protein other 

10 than a single chain antibody comprising (i) a 

structural -signal directing the display of the 
protein on the outer surface of the package arid 
(ii) a potential binding domain for binding said 
target^ where a plurality of different potential 

15 binding domains are displayed by said population, 

b) causing the escpression of said proteins and the 
display of said proteins on the outer surface of 
such packages, 

20 

c) contacting the packages with target m^^terial so 
that the potential binding domains of the proteins 
and the target material may interact, and 
separating packages bearing a binding domain that 

25 binds target material from packages that do not so 

;bind, and 

d) recovering and replicating at least one package 
bearing a successful binding domain^ 
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preferably further comprising (e) determining the 
amino acid sequence of a successful binding 
domain. 



and more .preferably, further comprising (f) 
preparing a new variegated population of 
replicable genetic packages according to step (a), 
the parental potential binding domain for the 
potential binding domains of said new packages 
being a successful binding, domain whose sequence 
was determined in step (e) , and repeating steps 
(b)-(e) with said new population. 

The ■ method • of. claim 1 wherein the population of 
replicable ' genetic packages of step (a) is 
obtained by: 

i) preparing a variegated population of DNA 
inserts of each of w;hich comprises a first 
sequence which codes on expression for a potential 
binding domain and, a second sequence encoding 
signal directing that the encoded protein be 
displayed on the outer surface of a chosen 
replicable genetic package, and 

ii) incorporating the resulting population of DNA 
constructs into the chosen replicable genetic 
packages to produce a population of replicable 
genetic packages, 

wherein preferably (1) said population is 
characterized by the display of at least 10^ but 
not more than 10^ different potential binding 
domains and/or (2) from 1 in 10^ to l in 10^ of 
the packages of said population display the same 
potential binding domain • 

The method of claim 1 wherein, in step (a) , the 
potential binding domains encoded by the nucleic 
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acid constructs are each, related in sequence to a 
parental potential binding domain by a limited 
nxxmber of amino acid substitutions in the amino 
acid sequence of- said parental potential binding 
domain, and, preferably the level of variegation 
of the population is chosen such that the packages 
displaying potential binding domains obtained by 
single amino acid substitutions in the amino acid 
sequence of the parental potential binding domain 
are present in detectable amounts, and preferably 
the initial'ly chosen parental potential binding 
protein has at least one stable binding domain and 
said domain bas a melting point of at least 60^C 
and is stable over a pH range of at least 3.0-8.0- 

The method of claim 1 wherein the displayable 
potential binding protein is a chimeric protein, 
and preferably, wherein said signal is provided 
by a segment of said chimeric protein which is 
essentially identical in amino acid sequence with 
at least a functional portion of a natural outer 
surface protein encoded by said genetic package or 
a cell naturally infected by said genetic package, 
said portion directing the transport of said 
chimeric protein to the outer surface of the 
genetic package • 

The method of claim 3 wherein the parental 
potential binding domain is initially chosen to be 
one which is over 50% homologous with a domain of 
a laiown protein, the latter domain having a 
melting point of at least about 60^C, 
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The laethod of claim 5 wherein the initially chosen 
parental binding protein does not preferentially 
bind the predetermined target. 

The method of claim 3,. said target material 
comprising one or more discrete molecules, said 
parental potential binding domain being 
characterized as a sequence of amino acids, 
further comprising identifying an interaction set 
of amino acids, which are on the surface^ of the 
parental potential binding domain and which can 
all simultaneously touch a single molecule of the 
target material, and obtaining potential binding 
domains by. substituting a different amino acid for 
one or more of the amino acids in said interaction 
set. 

The method of "claim 1 wherein the target material 
is a non-macromolecular organic compound and the 
potential binding domains comprise greater than 
about 80 amino acid residues. 

The method of claim 1 wherein the target material 
is a non-macromolecular organic compound and the 
potential binding domains comprise greater than 
about 80 amino residues. 

The method of claim 1 wherein the target material 
is a mineral insoluble in aqueous solution. 

The method of claim 1 wherein the target is an 
inorganic molecule or complex ion that is stable 
in aqueous solution. 

The method of claim 1 wherein the target is an 
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organometallic * compound that is stable in aqueous 
solution. 

13 . The method of claim 1 wherein the target material 
5 is a general protease^ wherein the immobilized 

target material is first incubated with- an 
irreversible or covalent inhibitor to inactivate 
. the protease. 

10 14 . The method of claim 1 wherein the replicable 
genetic package is a cell or virus that can be 
affinity separated and retain viability. 

15. The method of^ claim 5 wherein; the known binding 
15 protein is an enzyme, the activity of which has a 

deleterious effect on the replicable genetic 
package, the host of the replicable genetic 
package, or the target,, wherein the majority of 
the nucleic acid constructs code on expression or 
20 an analogue of the known binding protein that does 

not have such deleterious enzymatic activity, 

16.. The method of claim 1 wherein the target contains 
ionizable groups and the pH of the solutions of 
25 the intended use and the pH of the affinity 

separations are chosen so that both the potential 
binding protein and the target remain stable. 

17 . The method of claim 1 wherein the target contains 
3p ionizable groups, further comprising providing 

counter ions to reduce electrostatic repulsion 
between the potential binding protein and the 
target. 
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18. The method of claim 1 wherein the initial 
potential binding domain is picked so that, imder 
the conditions of intended use of the desired 
binding protein and under the [ conditions of 

5 affinity separation, that the potential binding 

domains and the target will either have opposite 
charge or one of them will be neutral . 

19. The method of claim 28 wherein the replicable 
10 genetic package is a bacterial cell, such as 

a strain of Escherichia coli. 

20. The method of claim 1 wherein the replicable 
genetic package is a bacterial spore such as 

15 a Bacillus endospore. more preferably an endospore 

of a strain of B*. subtilis . 

21. The method of claim 1 wherein the replicable 
genetic package is a bacteriophage , such as a 

20 filamentous phage, preferably a derivative of an 

M13 Escherichia coli bacteriophage or derivative 
of the Pseudomonas aeruginosa filamentous phage 
Pfl- 

25" 22. The method of claim 21 wherein the signal is 
provided by the coat protein of M13 or a segment 
thereof embodying an outer • surface transport 
signal. 

30 23. The method of claim 21 wherein the signal is 
provided by the gene III protein of ^ M13 or a 
. segment thereof embodying an outer surface 
transport signal. 
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The method of claim .2 wherein ths distributioxi of 
nucleotides incorporated at each variegated codon 
is chosen to yield substantially equal abundances 
of acidic and basic amino acids , and, preferably 
the distribution of nucleotides incorporated at 
each variegated codon is further chosen to yield 
the largest value for the quantity (Cl*" 
abundance (stop codons) ) times (abundance of the 
least abundant amino acid)/ (abundance of the most 
abundant amino acid)}. 

The method of claim 1, wherein step (c) further 
compr^ises contacting the packages with a second 
material and isolating packages which do not bind 
that second material . 

The method of claim 1, wherein after obtaining a 
" novel binding protein recognizing a first 
predetermined target, the novel binding protein is 
chosen as a parental potential binding protein for 
the isolation of a derivative protein which also 
binds to a second predetermined target. 

The method of claim 3 wherein the initially chosen 
parental potential binding domain is selected from 
the group consisting of (a) binding domains of 
bovine pancreatic trypsin inhibitor, crainbin, 
ovomucoid, T4 lysozyme, hen egg white lysozyme, 
ribonuclease, and azurin, and (b) domains at least 
50% homologous with any of the . foregoing domains 
and which have a melting point of at least SO^C. 
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The method of claim 36 wheirein. the outer surface 
transport signal is provided by the lamB protein 
or a segment thereof embodying an outer surface 
transport signal. 

The method of claim 38 wherein the outer surface 
transport signal is provided by the cotA, cotB/ 
cote or cotD protein or a segment thereof 
embodying an outer surface transport signal. 

A chimeric' protein comprising (i) at least a 
segment of an outer surface protein of a cell or 
virus, said segment providing an outer surface 
transport signal recognized by said cell or virus, 
and (ii) a domain foreign to said outer surface 
protein, and, preferably, said foreign domain 
binds to a target material not preferentially 
bound by said outer surface protein. 

A replicable genetic package which contains a 
nucleic acid construct which codes on expression 
for the chimeric protein of claim 30. 

The method of claim 1 wherein in at least one 
instance the amino acid residues varied in a first 
assortment of potential binding domains are left 
constant in the next assortment of potential 
binding domains. 

A method of preparing a population of variegated 
DNA wherein the distribution of nucleotides . 
incorporated at each variegated codon is chosen to 
yield siabstantially egual abundances of acidic and 
basic amino acids, and, preferably, the 
distribution of nucleotides incorporated at each 
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variegated codon is further chosen to . yield the 
largest value for the quantity { (1.- abundance (stop 
codons) ) times ( abundance of the least abundant 
amino acid) / (abundance of the most abundant amino 
5 acid) } . 

34. The protein of claim 66, wherein the protein 
comprises a first foreign domain recognizing a 
first target material and a second foreign domain 
10 recognizing, a second target material, 

35* The method of claim 3 wh^erein the initially chosen 
parental potential binding domain is at least 50% 
homologous with the binding domain of bovine 
15 pancreatic trypsin inhibitor. 
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Group I. Claims 1-32, 34 and 35. drawn to a method of producing a 
binding protein, and a protein classified in' Class 43*5. subclass. 
68 and class 530 subclass 3S7. ^ 

If Group I is elected, an additional election nust be made'. 
Claims 1-2 are -generic to a plurality of .disclosed patentably 
distinct method species comprising those set forth in: 

a) claims 3 and 5, wherein .the potential binding domains sire 
muteins of the parental binding domains- 

b) claim 27, wherein the potential binding domains are mureiris of 
binding domains of bovine pancreatic trypsin inhibitor; 

c) claim '27, wherein the potential binding domains are muteins of 
binding domains of crambin; 

d) claim 27. wherein the potential binding domain is a mutein of 
ovomucoid; 

e) claim 27 . wherein the potential binding domain is* a mutein of 
T4 lysozyme ; 

f) claim 27, wherein the potential binding domain is a mutrein of 
hen egg white lysozyme: 

g) claim 27, wherein the potential binding domain is a mutein of 
ribonuclease; 

h) claim 27, wherein the potential binding domain is a mutein of 
azurin; 

i) claim 4, wherein the potential binding protein is a chimeric 
protein^ 

j) claim 8, wherein -the target material's potential bindina 
domain comprises less than 80 amino acids; 

k) claim 9^ wherein the target material's potential binding 
domains comprise greater than 8 0 amino acids ; 

1) claim 10, wherein the target material is a mineral insoluble 
in aqueous solution ; 

m) claim 11, wherein the target is an organometallic compound 
that is stable in aqueous solution; and 

n) claim 13, wherein the target material is an inactivated 
protease - 

II, Claim 33, drawn to a method of preparing DNA based on a 
mathematical formula, classified in Class 435, subclass 17 2.3. 

DETAILED REASONS FOR HOLDING LACK OF UNITY OF I NVENTION: 

The inventions are grouped above according to the unity of 
invention concept reflected in Rule 13,2. 

The process as claimed . can be used to make other and 
materially different products as evidenced by each of the species 
in Group I, Also the product as claimed can be made by another 
and materially, different process such as chemical peotide 
synthesis. 

No required additional search fees were timely paid by the 
applicant- The ^ international search report is. restricted to the 
invention first mention in the claims, "namely Generic claims 1 
and 2 to the extent they read on species la. 



